Hi, I'm Ankit currently I enrolled for AWS Training and Certification.
Transcript
Interactive Transcript - Enable basic transcript mode by pressing the escape key
You may navigate through the transcript using tab. To save a note for a section of text press CTRL + S. To expand your selection you may use CTRL + arrow key. You may contract your selection using shift + CTRL + arrow key. For screen readers that are incompatible with using arrow keys for shortcuts, you can replace them with the H J K L keys. Some screen readers may require using CTRL in conjunction with the alt key
Welcome. Welcome to an introduction to machine learning.
I'm Blaine Sundrud.
I'm a Senior Instructional Designer and a Technical
Trainer at Amazon Web Services, AWS.
I am thrilled to welcome you to one of
the first sessions of today's AI Innovate event.
Over the next hour,
I'm going to help you lay
the framework that you're going to need to start
developing machine learning solutions in your own work.
I'll define some key terms.
We're going to talk about the different types of
machine learning algorithms that'll help
you solve business problems that
you're going to face out in the real world.
Then we're going to walk through
the machine-learning pipeline.
Let's start this 60 minute journey off with a story.
This is a use case of a machine learning in action,
one that took place right here at Amazon.
Let me head over to the light board
to illustrate the story for you.
Hold on a second. All right.
Several years ago, Amazon, @amazon.com,
needed to improve the way
that it routed customer service calls.
So it looked to machine learning for help.
Now the original routing system
works something like this,
customer calls in, greeted by the menu,
"Press 1 for return.
Press 2 for Kindle,
3," but whatever. You get the idea.
The customer has to make a selection,
and then they're sent to an agent.
Agent. The agent is there to go ahead,
and they're trained in the right skills
to help them with the customer.
Well the problem is,
hey, you might have guessed,
the kind of things that we do and sell here at Amazon,
there's a lot of stuff.
So the list of things that customer could be
calling from is practically endless.
So if we didn't get
the right option for the customer to call in,
then the customer is sent to
a generalist as opposed to a specialist.
The generalist has to figure out what they want.
So they send him to
another person who hopefully is the right one.
Maybe it is. Maybe it's got the right skills. Maybe not.
It's got to sent to another one, and so on.
You keep going through all these pieces until
eventually you get to
the real happy person
that's supposed to help out the customer.
Now for some businesses,
that might not be the end of the world.
For Amazon, when you're dealing with
hundreds of millions of customer calls every year,
this path, that's inefficient.
It cost a lot of money, wasted time,
and worst of all it's not a good way
to get our customers the help they needed.
Well you can probably guess the rest of the story.
Amazon built a system.
They used machine learning to
improve the whole routing system.
So the idea was to get rid of all of
this extra stuff and go
straight to the agent that could help them.
This made the customers happier.
It made the call center agents more productive.
Basically, everyone lives happily ever after.
So I'm going to show you
how Amazon actually did it, how we did it.
We're going to spend the next 60 minutes
walking through this machine learning pipeline,
and explain how we deployed this smarter,
more intelligent, customer service routing system,
this ML system, so that you can
develop your own ML solutions moving forward.
So first, what does
the machine learning pipeline look like anyway?
Well it starts with collecting and integrating your data.
Then you prepare the data,
visualize it for analysis,
then you select the features you want
to use and engineer some as well.
Then you can train your model,
evaluate it and deploy it.
So at this point now,
it's time to turn
our business problems into machine learning problems.
So let's start with the business problem.
So our business problem in this case
is how are we going
to route our customer calls successfully?
The machine learning problem,
we'll get to that in a second.
In fact, before you can do
any of the things we're going to talk about,
we have to decide even if a machine-learning,
ML, is the right solution to deploy in the first place.
All right. So at this point,
is machine learning an appropriate solution?
So let's break it down.
Machine learning is a subset
of artificial intelligence or AI.
Machine learning uses data,
and this data is going to be used to train the model,
and the model is then used for predictions.
Can I spell predictions?
Close enough. Good enough.
The predictions then, we
can make those from huge datasets,
because the strength lies in
its ability to extract hidden patterns,
structures from this data.
Now common use case for machine learning.
Well let's call it a credit card transactions.
So card transactions.
So there's card transactions.
In this case, we
find the appropriate data because what we're
looking for is to determine fraud.
The appropriate data is mind.
We're identifying patterns among
all of the card transactions,
specifically looking for patterns that
indicate a fraudulent transaction.
With these patterns, you can train
the ML model to predict future transactions as,
yes, fraudulent, or no, not fraudulent.
So with that in mind,
let's return to the question.
Is machine learning an appropriate solution
for the business problem?
Well in this case, it was for the Amazon call center.
They had millions of
historical phone calls as the dataset,
but there was no single indicator that they could use to
get a customer directly to an agent in just one step.
It's more complicated than that.
So we needed to identify patterns
within the whole range of
customer data that could help us
route customer to the right agents in a single-step.
Tons of data, tons of data,
all of which needed to be analyzed for
patterns that Amazon could
use to make accurate predictions.
Knock-knock, who's there?
Machine learning. This is exactly
the problem that machine learning was built for.
So in this case,
the machine learning problem that it's actually solving
for is to predict the agent skills.
Now there are different types
of machine learning problems out there.
So hypothetically speaking, let's say
that with our call center problem,
our original goal was just to
predict whether a customer was simply
calling in about their Kindle or
not calling in about kindle.
This type of problem is
considered a binary classification.
So it's binary. Let me put it over here.
So binary, Kindle or not Kindle.
Simple as that. There's only two outcomes.
It's a classification problem because we're predicting
a category instead of a real number like a price.
Although it's simple,
this basic classification task
supports a wide variety of elegant,
scalable and actually very powerful business solutions.
For example, is this credit card fraudulent or not?
The other type of problem you might
run into is multi-class.
A multi-class solution in this case,
we're still predicting a category,
but it's more than just two outcomes.
It's not Kindle or not Kindle.
We might be looking at a number of different choices.
We're going to walk you through,
today's example is actually a multi-class solution.
There's many ways of classifying the type of skill that
would be needed to solve this particular customer call.
Maybe they're looking for a Kindle,
but maybe it's to return a product,
maybe it's to answer a question about Alexa,
or whatever it might be.
There could be any number,
hundreds of different things it could be.
In these two examples,
binary or multi-class solution,
these are classification problems.
But there's also regression.
In a regression problem,
I'm no longer mapping to a series of defined categories.
Now, I'm looking for a continuous values such as numbers,
integers, 1, 2, 3, whatever.
An example of machine learning regression problem,
predicting the price of your company's stock.
Let's get back to our call center problem.
We have determined our machine learning problem.
This is what we're looking for. You've identified it,
it's a multiclass problem.
So we have a whole different set of outputs it could be.
At this stage, now it's time for us to talk to
our domain experts and gather more information.
Time to challenge your assumptions.
During this phase, some of
the questions Amazon asked were,
what exactly did these customer
service agents skills represent?
How much overlap were there between the skills?
Are they similar enough where I might be able
to possibly combine them?
What happens when a customer's
routed to an agent with the wrong skill?
Did that agent stand a chance of
possibly answering the question anyway?
The more questions you ask during this discovery stage,
the more inputs you'll
get and the more input that the domain experts give you,
the different people, then the
better your model is going to be.
All right. Let's go back to the desk,
put a few more things on this. All right.
Now, it's time to get started with your ML pipeline.
Now, it's all about the data and
the training that data to
enable the models to make your predictions.
Data is everywhere and because it is everywhere,
it can be collected from multiple sources like Internet,
Databases, other types of Storage.
Chances are very good.
Some of the data your team collects however,
it's going to be noisy.
Your data is possibly incomplete even irrelevant.
So wherever it comes from it will
need to be compiled, get integrated.
Most importantly, you have to clean the data.
First, you need to collect and
integrate the data that's relevant to your problem.
No matter what type of data you're collecting,
you're going to need to make sure that you've
got the proper tools and
the knowledge to work with all different datatypes.
But let's go back to our call center use case.
The data we needed came from answering
questions like what were the customer's recent orders?
Does the customer own a Kindle?
Are they a prime member?
The historical customer data that answers
questions like these are called Features.
They could features as your inputs to the problem.
The machine learning model's job
during training is to learn which of
these features are actually
important to make the right prediction for the future.
If the value you're looking for is know,
like in a supervised learning,
then that prediction is called a label.
But if the value isn't known,
like in unsupervised learning,
then it's called a target.
We'll talk more about supervised and
unsupervised learning in a bit.
Don't worry about that. For right now,
just know that in our call center example,
our label was the skill
an agent needed to resolve the customer call.
All right. Together, the label and the features,
this makes up a single data point.
This is called an observation.
Stack up a bunch of observations, that's your dataset.
Good data will contain
a signal about the phenomenon you're trying to model.
For instance, let's say
there's merchant trying to forecast demand for products.
They might track number of sales they've had, good start.
But what if they've forgotten to
log when certain products we're out of stock?
If you're trying to forecast demand,
it's important to know when you were out of stock,
and therefore, critical to have data that
represents that as one of your features.
Here is a general rule of thumb.
You need at least 10 times the number
of data points as features.
So if you've got five features,
you should have 50 data points
minimum in your training data.
So data preparation, as you can see,
sometimes that very first dataset
is not going to be enough for a good prediction.
As developers, it's important to understand
what data you're missing so that you can access it.
This is where the data preparation phase comes in.
First step, take a small random sample
of your data and you really need to dig into it.
Now, you probably need between 20 to 50 observations.
Although again, that depends how many features you have.
Your job in the data prep phase is to
manually and critically explore the data.
You've got to look at it close.
Ask yourself questions like this,
what features are there?
That Step 1. Does it match your expectations?
Is there enough information to make accurate predictions?
If you just looked at it, what are you going to see?
Here is a good rule of thumb.
If a human, you could look at
a given data point and guess the correct label,
then an ML algorithm should be successful there too.
Now, you might also want to
critically think about your labels.
Ask yourself, are there any labels that you
want to exclude from
the business model for business reasons?
Are there any labels that aren't entirely accurate?
In the call center use case,
we asked some domain experts key questions that
helped inform this part of Amazon's Analysis.
For instance, we would ask,
how much overlap was there between skills?
Were any skill similar enough to be combined?
If we did our homework and
properly answered those types of questions,
we may have been able to simplify our model
by excluding a few labels.
For instance, instead of having labels that
represent multiple Kindle skills,
it might have made sense to just combine those into
one overarching Kindle skill label.
That way, every customer that
had a problem with a Kindle can
be routed to an agent trained in all kindle issues,
rather than tinker toy,
little tiny things here or there.
It can be hard to understand
your data without seeing the data.
That's why you need to do more than
just a manual analysis,
you need a programmatic analysis.
This is what you get when you visualize the data.
I love visualization.
Visualization is great.
It's a technique that helps you
understand the relationships within your dataset.
This leads to better features, better models.
When you can see the data in a chart or plotted out,
you can help unveil previously unseen patterns.
It reveals corrupt data or outliers that you don't want,
properties that could be very
significant in your analysis.
The Amazon example.
A programmatic analysis of the label might have
shown 50 percent of the calls were related to returns,
40 percent were for prime membership,
30 percent related to kindle, and so on.
Basic stats like these can be powerful methods to
obtain quick feature and
labeled summaries to understand them.
Two other common visualization
techniques we're going to cover,
histograms and scatter plots. Let's take a look at that.
All right. Let's talk about histograms.
By the way, thanks Tom for pre-drawing one for me.
Histograms are effective visualizations
for spotting outliers in data.
For example, let's say you're visualizing
the distribution of hours per week
your company's employees actually work.
So you're trying to make a prediction about
salaries and you're going to base
that on the number of hours
your full-time employees actually show up to work.
So with this histogram,
you can see that the majority
of your employees are working
between 35 and 55 hours a week.
But you can also see there's a lower outlier over here.
Couple of your employees are working 15-20 hours a week.
Well, maybe you have some part-time employees that for
whatever reason got mixed
into your dataset of your full-time employees.
If you want it base
your prediction on full-time employees only,
then it's important to identify and
remove these part-time employees from your dataset.
Well, in this case, you could just delete
the outlier data or you could cap it.,
so you don't see any data for any employee who
worked less than 35 hours this week.
This solution would help you ensure you're only
looking at that full-time employee set.
But there's other solutions.
For instance, in the multi-class classification problem,
you're going to want to figure out how
to actually combine this
outlier data with other data classes
rather than just ignoring it or deleting it.
For example, in the call center example,
we had multiple Kindle skills.
Well, ultimately, Amazon just decided to combine
specific Kindle skills into
a single general Kindle skills for its model training.
If it's a regression problem,
you can deal with outliers or even missing data
by just assigning a new value using imputation.
Now, imputation is going to make a best guess,
so to speak, as to what the value actually should be.
For instance, you might have a set of data and you can
take a mean, 45.
This 45 is going to
be what I would use in the case of missing data.
So in the salary prediction example,
let's say our data looks something like this.
So Employee 1 is going to actually work,
let's say 35 hours for a week and another employee,
E2, is going to work 44 hours for that week.
Then you have Employee 3.
No, there is no data for Employee 3.
Rather than just eliminating it or worse putting a
zero because Employee 3 did not
work zero hours that'll mess up your data,
in this case, you can simply take your mean which is 45,
I embed drawing on this board.
So we are going to just put that in there and say,
for Employee 3, it's going to
take the mean which is 45 hours.
Great. In place the missing data.
It's not a zero, I'm not ignoring Employee 3,
I'm still going to have wait for
it even though I don't know
what it is, it's going to be valuable.
All right. Good. Now, along with histograms,
another visualization tool, scatter plots.
So in this case,
the idea of scatter plots is to visualize
the relationship between the features and the labels,
where what you've got are a whole lot
of different unique points.
It's important to understand if there's
a strong correlation between features and labels.
In this instance,
a scatter plot might actually help us see the correlation
between the number of hours
worked and their income levels.
So in this case,
yeah, it's looking like a strong correlation.
Now, on the flip side,
we might see a weak correlation if we were to use age and
those elements being out
here in that case, nothing of value.
When thinking about data preparation,
keep in mind that if you don't address noisy data,
it's just going to hurt your model's performance.
These types of visualization techniques
and approaches are critical.
Your model will suffer because of
noisy data points like outliers or missing data.
This results in less accurate predictions.
So we've been talking
about in order to get accurate predictions,
you have to get clean data. But there's more to that.
You need an algorithm
that makes sense for your business problem.
Choosing the right algorithm for the job
is another big step in this part of the ML pipeline.
It can be a challenge for
any machine learning practitioner
especially given that there are
several 100 algorithms out there.
Now, to help out,
let's talk about these four different categories
of machine learning algorithms.
We've got supervised, unsupervised,
reinforcement, and deep learning.
Let's start here. Supervised learning.
It's a popular type of
machine learning because it's widely applicable,
has several successful applications out in the world.
The focus of supervised algorithms is on learning
patterns by seeing the relationship between
variables and known outcomes.
It's called supervised learning
because there needs to be a supervisor,
a trainer, that can actually
show the engine the right answers, so to speak.
In machine learning by the way,
a trainer can be any sort of complex systems,
could be machine, could be human,
or other natural processes.
Imagine you're training a Machine Learning model that's
capable of predicting future earthquakes.
In this case, the teacher or
the ultimate source of truth is nature herself.
Like any student,
a supervised algorithm needs to learn by example.
Essentially, it needs a teacher who
uses training data to help it
determine the patterns and relationships
between the inputs and the outputs.
This picture here for example, it's a car.
So you get a nice little car.
I embed the cars.
It's got two wheels,
it's got a headlight, you got to have
a window font there, that's a car, great.
This one over here,
well, that's a truck.
Okay. Fine. After the training is finished,
a successful learning algorithm
can make the decisions on its own.
You no longer need a teacher to actually
label things as car or truck.
In the end, the output knows that's a car,
that's a truck, it can do it by itself.
The call center use case
is an example of supervised learning.
We trained our model on a bunch
of historical customer data
that included the correct labels
or the customer agent skills.
That enabled the model to make
its own prediction based
on other similar data moving forward.
So for example, that I know that
this particular call needs someone with a Kindle skill.
We'll talk more about what it means for
an algorithm to determine relationship later,
we talk about parameters and hyperparameters for now.
Let's go ahead and focus right now though on
the types of algorithms rather than those elements.
Supervised algorithms.
They need good training datasets.
In properly labeled observations,
hang on, I need to emphasize something.
It's really important to know that this type of
machine learning is only successful
if the system we are trying to model it after
is already functioning and easy to observe.
If we want to train a model that label
cars or trucks or buses or whatever,
then we need to make sure that
the training data is labeled.
If not, then you got to go through
a large number of photos and
actually label them manually.
Now, if such a human process was not already in place,
then obtaining that ideal training dataset,
it could be problematic and might ultimately be
a reason to not pursue a supervised learning algorithm.
So let's talk about
what happens when there's no teacher in the room.
Okay, I'm gone. Wait a minute,
I still have to be here. Sorry, I thought you'd quit.
No. Hello, here we go.
Sometimes all we've got is just the data.
No provided labels.
There's nobody here telling you what something is.
Can something useful still be learned?
Well, yeah, that's unsupervised learning.
With unsupervised algorithms,
we don't know all the variables.
We don't know the patterns.
So the machine itself simply looks at
the data and tries to create labels all on its own.
A common type of
unsupervised learning, it's called clustering.
This algorithm,
it groups data points into different clusters
based on similar features in order to better
understand the attributes of a specific group or cluster.
For instance, let's say you sell
office supplies different companies all over the world.
Well, in analyzing customer purchasing habits,
an unsupervised model might actually be
able to identify two different groups.
Each groups, there's no need for
a label but what it finds out is that maybe
this one group is just
purchasing paper and pencils or whatever,
and this turns out to be smaller companies.
Whereas this other cluster of groups is buying
conference tables and chairs and big furniture items,
it turns out this happens to be your larger companies.
You may not have had this label initially
but just their purchasing habits started
dividing them up into buckets
automatically or specifically the engine did that.
Clustering in this situation
could help you realize that you need
to come up with different marketing strategy
for different types of companies.
Consider fraud detection.
A supervised algorithm could predict
a particular threat that's already been classified.
But the most dangerous attacks
are the ones you don't see coming.
The ones you don't know about.
That is the ones that haven't already been labeled.
To detect an unclassified category
of fraud in the early phases,
like a sudden large order from an
unknown user or a suspicious shipping address,
unsupervised algorithms group malicious actors
into a cluster and then
analyze their connections to other accounts without
knowing the actual labels of the attack originally.
All right. Another algorithm,
it's been gaining popularity
a lot recently, Reinforcement Learning.
Let me put this up on the board here.
So if our example here we're going to start
with the agent, then an action.
This is the environment.
Finally, we come over here to the state, the reward.
This becomes the loop.
Now unlike these first two algorithms
that both actually have an endpoint and end state,
this one, reinforcement, continually
improves by mining feedback from previous iterations.
In reinforcement learning, this agent continually
learns through trial and error
as it interacts with the environment.
The reinforcement learning is broadly useful when
the reward of a desired outcome is
known but the path to achieve it isn't.
That path requires a lot
of trial and error to actually discover.
Well, let's think of Pac-Man here.
So you've got Pac-Man.
We get Pac-Man out there.
Pac-Man, fine. In this case,
maybe it's supply chain,
but Pac-Man's more fun.
So we're going to go ahead and the action
might be is it going to go left?
Is it going to go right, don't know?
So depending on whether he goes left or right,
the reward or the state is going to constantly change.
The model is learning,
and it's going to be graded
as opposed to tagged or labeled.
So if I go left,
maybe no that's bad.
So I'm going to have a score of minus two.
But if I go right, okay that's good,
that's going to be a plus two whatever it is.
Think about playing a new board game
but you don't know the rules,
and you might not know the intricacies of the game,
but you just know you got to get
to the other side of the board.
So as you move through the game and you learn
the values of certain actions,
you get more familiar with the space,
left, bad, right, good.
"No, it's fire-breathing dragon.
No, negative two whatever."
Fine. These values you
learn can influence your future behavior.
Well, I'm sure as heck ain't going to do
that move again that's a bad one.
I'm not going to keep moving towards
the dragon or the ghost.
So as a result,
the performance starts to improve.
It gets better based on your past experience. All right.
That's reinforcement, fine.
Now, let's talk about the deep learning algorithms.
Yeah, here's a buzzword for you, right?
Deep Learning. It's a reinvention
of artificial neural networks.
Now if you're thinking about
the biological neural network,
so here's a neuron and that's connects and it's fine.
I can't draw neuron, pretend it's a neuron.
You've got these in the brain. If you think like that,
you're actually on the right track because just like
a biological neural network
where this connects to another nerve, another nerve,
and so on, each neuron is activated when the sum of the
input signals into one neuron
exceeds a particular threshold.
The thing is, a single neuron,
it's not sufficient for
any practical classification needs.
Instead, we combine them into a fully connected set of
layers to produce artificial neural networks.
We call these Multilayer Perceptrons.
So you might start with some inputs,
and each of these can be considered a neuron
but then you've got a whole lot of
different hidden layers and each
one of these of their own piece and so on.
Eventually, you might get to an output,
but it's a collection of these.
How deep is deep learning in the real-world?
Some networks can have
thousands of layers of these perceptrons.
As you can imagine,
the computational power required to
train such networks, it's not cheap.
One important breakthrough in
deep learning was the invention of
Convolutional Neural Networks or CNNs for short.
These are especially useful for image processing.
Now the main idea of a CNN is,
in this case for image processing is I take
nearby pixels in the image into
account instead of treating them
as entirely separate inputs.
A special operation called a convolution
is applied to entire subsections of the image.
If several convolutional layers
are stacked one after another,
each convolutional layer learns
to recognize patterns that
increase in complexity as it moves through the layers.
Now, if we take the output of a neuron and feed it
as an input to itself or to neurons of previous layers.
So instead of everything going in one direction,
we actually feed backwards or maybe into itself.
This is what we call Recurrent Neural Networks.
It's as if the neuron remembers
the output from a previous iteration,
thus creating some memory.
A more complex network is called LSTM,
stands for long short-term memory.
It's commonly used for speech recognition or translation.
It's conversation for another time.
Feature Selection.
This our next important step, feature selection,
where you get to select which
features you want to use with your model.
What you want to have is
a minimal correlation among your features,
but you want to have the maximum correlation
between the features and the desired output.
So you want to select the features that
correlate to your desired output.
Now part of selecting the best features includes
recognizing when you've got to engineer a feature.
Feature engineering is the process of manipulating
your original data into
new and potentially a lot more useful features.
Feature engineering is arguably
the most critical and time-consuming step
of the ML Pipeline.
It answers questions like,
do the features I'm using
makes sense for what I want to predict?
Or how can I
systematically take what I've learned about my features
during the visualization process and
encode that information into new features?
For instance, in looking at
the raw data of our call center use case,
you might have noticed already,
50 percent of the customers were
calling in about tracking a package.
However, after visualization,
25 percent of those customers
calling in about tracking packages,
they're actually located in the exact same city.
Now that's a large number.
It's potentially significant pattern.
In this situation, you could engineer
a feature for
customer's tracking packages in specific cities.
This information might lead to same patterns,
you otherwise wouldn't have seen before.
We've had some features that answered questions like,
what was the customer's most recent order?
What was the time of the customer's most recent order?
Does the customer own a kindle?
When we feed these features
into the model training algorithm,
it can only learn from exactly what we show it.
Here, for instance, we're
showing the model that this purchase
was made at 1:00 PM on Tuesday the 13th.
Well, unless we really want to predict something
extremely specific or we're doing a time series analysis,
that's not really a meaningful feature
we want to feed into our model.
It'd be much more meaningful if we could
transform that timestamp into
a feature that represents
maybe how long ago that order took place.
Knowing, for instance, that your last purchase was
months ago would probably help the model
realize that your last purchase is
probably not the reason you're calling today.
Now obviously, we can engine those feature just by
taking the diff between
order date-time and today's date-time.
That's a much more helpful feature.
Here's another example we could
use about image classification.
Let's say you wanted to train a model to
identify cars in a picture.
Fine. You can do this by feeding
raw images of cars and training it to identify the car.
But it won't be that helpful given that
these images are very complex combination of pixels.
The raw data, that is
the raw images you're going to feed in,
it doesn't include
any higher-level features such as edges,
lines, circles, the patterns that it can recognize.
So during the feature engineering stage,
you can pre-process the data.
This will classify it,
possibly get to more granular features,
that way can feed those features back
into the model, get better accuracy.
We'll talk more about accuracy and
precision in a little bit, but that's critical.
Finally ready for training.
First step you have to take
when you're officially training
your data is you have to split it.
Now, splitting the data allows you
to ensure that you've got production data that's
similar to your training data that your model will as
a result be more
generalizable or applicable outside
of the training environment.
Let's head over the board
so we can investigate this a little
more closely. Here we go.
Once again, thanks Tom for doing the work for me.
Typically, you want to split
your data into three sections:
you've got your training data,
your dev data, and your test data.
Now, training data is going to
include both the features and the labels,
this feeds into the algorithm you've
selected to help produce your model.
The model is then used to make
predictions over a developments dataset,
which is where you'll likely
notice things that you'll want to tweak,
and tune, and change.
Then when you're ready,
then you can actually run the test dataset,
which only includes features since
you want the labels to be
what's predicted through the model.
The performance you get here with a test dataset
is then what you can
reasonably expect to see in production.
The amount of data you will
have determines how ultimately you split it up.
But regardless, you'll want to train
your model on as much data as
possible knowing that you're going to need to
reserve some of it for
the dev phase and some for testing.
So if you have a lot of data,
then you can probably split it up into let's
say 70 percent here for training,
and 15 percent for dev,
and another 15 percent for test.
If you have little data, well,
maybe it's 80 percent 10 and 10,
you'll end up working it out the way you can.
Another important thing to note though
as you start splitting up your data,
make sure you randomize it.
This is critical.
You've got to randomize it during
your split to help your model avoid bias.
This is especially true with structured data,
if your data coming in a specific order.
So let's say for example
that your data is listed sequentially.
Well, your model will start to become used to
that structure and it will start to
adapt to this pattern as it learns.
Then eventually when you run
your model against test data,
this pattern of sequential data
will be applied and that'll bias your model.
So effectively, to make sure your model isn't biased,
you need to feed it randomized data.
Now, popular randomization is simply shuffling your data.
Now, if you aren't familiar with that, no worries,
there's a lot of great tools out
there that will help you shuffle your data.
For example, Scikit-learn.
Now, randomizing and splitting your training data
is a critical step in the training process.
Common mistake people make is that
they don't hold out testing data,
and what they end up doing is simply
testing on part of
the data they trained with, the training data.
Well, this doesn't generalize your model,
it actually will lead to either over fitting or
underfitting. Let's talk about that.
Overfitting it's where your model
learns the particulars of a dataset too well.
It's essentially memorizing your training data
as opposed to actually learning the relationship
between the features and the labels
so the model can use what it learns in
those relationships to build
patterns to apply to new data in the future.
Remember our stock data from earlier.
Well, the model learns
the pattern here as the stock price
goes up the end of the month
and then drops the beginning of the month.
For example here, 30th 425, 4, 1, 375.
It might miss other important data
that's likely impacting the price,
such as April is tax season in this example.
It's clear here that mixing up the rows is going to be
necessary to give the model
an opportunity to learn other things from the data.
It's pretty clear here that we need to look at more,
a lot more dates.
In addition to simply randomizing the data,
it's also very important to collect
as much relevant data as possible because
underfitting on the other hand can occur if you don't
have enough features to model the data properly.
This can again prevent the model from
properly generalizing the data because it
doesn't have enough information to
predict a right answer, to predict correct.
To really understand overfitting
and underfitting and how to avoid it,
we need to talk about two things; bias and variance.
Think about bias as the gap
between your predicted value and
the actual value where variance
describes how dispersed your predicted values are.
Now, that's a lot of jargon.
So let's actually take a moment
and look at it visually over here.
So a bull's eye that's
a nice analogy to use here because generally speaking,
the center of the bull's eye is where you aim your darts,
the center of the bull's eye in
this analogy is the label or your target,
it predicts the value of your model.
Each dot is then going to
be a result that you're
model produced during the training.
So let me demonstrate.
So we start with a low bias, low variance model.
Everything's clustered tight and
it's right there in the bull's eye.
I'm getting everything I predict in one area,
there's not a lot of spread.
Now, next, if we go over to
a low variance but a high bias,
so in this case,
I'm not getting everything that I want,
but at least I'm getting
a predictable series of responses.
It's a tight cluster,
I'm just not on the bull's eye.
Now, on the other hand,
a high variance low bias.
Well, in this case,
it means I'm on
target as far as the center of the spread goes,
but the spread is wide,
it's all over the place.
Then high variance, high bias.
Yeah. This is the bad.
So in this case,
I'm all over the place and I'm not on target.
Ideal? What's the ideal case?
Yeah. You guessed it. You want
the low bias and low variance.
Realistically though?
Yeah. There's a balancing act that's happening here.
Bias and variance both contribute to errors,
but what's you're ultimately going for here
is to minimize your prediction error,
not bias or variance specifically.
That's the bias variance trade-off.
Bringing underfitting and overfitting
back into the picture.
Underfitting is where you've got
low variance and high bias.
These models are overly simple and they
can't really see the underlying patterns in the data.
Overfitting. That's the high-variance and low bias.
These models are overly complex,
and while they can detect patterns in the training data,
they're not accurate outside of the training data.
So let's consider our use case as an example.
Say hypothetically that we trained our model based
solely on data from customers who already had a kindle,
a prime account, and there was
a package tracking question
at some point during their membership.
So our model could detect a pattern that showed that
say 70 percent of
prime members call in about an Amazon device.
But should the model used this pattern and
try to make any future predictions?
Well, you'd probably say
no and if you did, you'd be correct.
In this example, the model
didn't even consider Alexa related data,
or what about deep lens,
or holiday data, or
any number of other types of data points.
Therefore, the model's going to
be underfitted because it's
hardly sufficient information to
predict at a more granular level,
while prime members are actually
calling in about an Amazon device.
Now, this is an oversimplified example.
But the point remains,
in testing and production,
our model won't pay attention
to these other missing categories,
it will skew the results towards
only the data that the model was actually trained on.
One technique that can be used to combat
underfitting and overfitting
is called hyperparameter tuning.
In machine learning, there are parameters,
and there are hyperparameters.
Let's go back to the desk and pull up the slides.
Let's talk about parameters briefly.
Now, a parameter is internal of the model and it's
something the model can learn
or estimate purely off of the data.
An example of a parameter could be the weight of
an artificial neural network
or the coefficients in linear regression.
The model has to have parameters to make predictions,
and most often, these aren't set by humans.
Hyperparameters on the other hand,
they're external of the model
and can't be estimated from the data.
Hyperparameters set by humans, and typically,
you can't really know
the best value of the hyperparameter,
but you can trial and error and use that to get there.
Yeah. Think about hyperparameter as the knobs, the lever,
you're going to use those to tune
the machine learning algorithm,
and that'll improve its performance.
The right hyperparameters have to be
chosen for the right type of problem.
Here's an example of a hyperparameter.
It could be the learning rate
for training a neural network.
Let's take a look at different types of hyperparameters.
Walking through this part of the process is one of
the most effective ways
of improving your model's performance.
So make sure you take the time to
conduct hyperparameter tuning thoroughly.
Speaking of which, now it is time to train your model.
The process of training an ML model involves
providing your algorithm with
training data to learn from.
As mentioned earlier, for supervised learning,
the training data must contain
both the features and the correct prediction,
which again we call labels.
The learning algorithm finds patterns in
the training data that maps the features to the label.
So when you show the trained model new inputs,
it'll return accurately predicted labels.
Then you can use the ML model to get
predictions on new data
for which you don't know the label.
For example, let's say you want to train an ML model to
predict if an email is spam or not spam. All right.
You provide your algorithm with training data,
It contains emails, and the known label,
those labeled tells it whether it's spam or not spam,
the algorithm then trains the model
using that data resulting in a model that
tries to predict whether a new email which it
hasn't seen before is spam or not spam.
All right. We did
the same process with our call center example.
We passed along features such
as does the customer own a kindle,
yes or no, along with the appropriate label,
yes it owned it, no they don't own it.
In this case, kindle skill we put into our algorithm,
which then learn the relationships
between these inputs and outputs and
spit out a model that could extrapolate
those patterns into similar data sets.
As explained earlier, after
the initial phase of training your model is done,
you'll need to evaluate how accurate that model is by
using the development data that you set
aside and run it through the model,
and this is going to tell you how
well you generalize the models.
The test data may be fed
the model for the most accurate predictions.
In fact let's circle back this topic
of accuracy and precision.
While you're evaluating you want to fit the data that
generalizes more towards unseen problems.
Remember from our earlier discussion about over-fitting,
you should not fit the training data to obtain
the maximum accuracy which
is kind of weird, kind of intuitive, right?
I mean, you want a model that predicts accurately
on previously unseen data, that's true.
But remember if you train your model to be too accurate,
it will be over-fit to that specific training data.
For classification problems like
the call center use case we've been dealing with all day,
we're trying to predict if a new observation will be
classified as this customer agent skill
or that customer agent skill.
One of the most effective ways to evaluate
your model's accuracy, precision,
and ability to recall involves looking
at something called a confusion matrix.
Now, the confusion matrix analyzes the model and
shows how many of the data points were
predicted correctly and incorrectly.
So let's take a look here,
in the bottom right,
this is the class one box.
Meaning this represents all of the true positives,
you predicted a one and you got a one.
So great, for our call center case
this could mean of all the things
you thought you'd predict,
and your model did predict,
for example needing to route customer
to a specific agent with strong elected skills,
your model did this 1,800 times.
In the top left box,
this is the class zero class zero box,
this is your true negative.
For instance, with our use case
you might predict the model
will not route calls to the fresh department,
and the model in fact did not
route any calls to Amazon Fresh.
The top right box,
now this is your class one class zero box
where you predicted in this case a one,
but you ended up getting a zero,
and finally last is this bottom box where
you predict a zero but end up getting a one.
To summarize, accuracy is the degree of deviation from
the truth or the total number of
right predictions divided by
the total number of predictions.
Precision is the ability to reproduce similar results,
and it's defined by all of
your true positive numbers divided
by true positive and false positive.
All right. At this point
after you've trained your model and you're
satisfied with the accuracy
based on some of the techniques we've talked about,
it's best practice to evaluate how it's
doing by running it against a few different algorithms.
Now, you should consider running it through
a couple different algorithms
within the chosen algorithm category.
So if you're working say with a supervised algorithm
like our classification algorithm for the call center,
you should try the model against
a different classification algorithm,
for example decision tree algorithm
or the K nearest neighbors algorithm.
But this will give you a better idea of how
to get the best fit and the best results for your model.
Okay. Deployments, monitoring, here we are.
You've prepared your data,
you've cleaned your data, you've visualized it.
You've selected your features,
you've split your data test your model,
you've tuned it several times,
lets be honest, and after you've done all that,
and you're satisfied with
the model's predictions on unseen data,
it's time to deploy your model into
production so it can begin making your predictions.
One of the primary ML tools for building,
training and deploying models Amazon SageMaker.
Amazon SageMaker is fully managed,
it covers the entire end-to-end pipeline
that we've just discussed.
The build module, SageMaker provides
a hosted environment for you to work with your data,
you can experiment with your algorithms,
you can visualize the output.
Then the train module actually takes care
of the model training and tuning at high scale.
Then it has the deploy module,
designed to provide you
a managed environment for you to host,
test models for inference,
secure low latency, the tools are there.
Now, this additional tool is and
SageMaker that are going to help you label data,
manager your compute costs,
take care of forecasting,
and much more that has battle of
one of our Machine Learning tech leaders here at Amazon,
they're going to discuss in a different
session later on today,
you don't want to miss
that one that one's going to be important.
All right. Getting back to deploying and monitoring.
You'll want to remember to monitor
your production data and
retrain your model if it's necessary,
because a newly deployed model
needs to reflect current production date,
you don't want to get out of date.
Since data distributions can drift over time,
deploying a model it's not a onetime exercise,
it's a continuous process.
You're not going to be out of a job.
It's a good practice,
you continually monitor the production data,
and retrain if you find
that the production data distribution
has deviated significantly from
the training data distribution,
no deviation, isn't a change.
Evaluating in a production setting
is a little bit different.
Now you've got to have a very concrete success metric
that you can use to measure success.
In our call center use case,
our routing experiments were predicated
on the assumption that the ability
to more accurately predict
skills would reduce the number of transfers.
Now, in production we
can actually put that assumption to test.
Well, okay so that takes us to
the end of them ML pipeline.
If that felt like a little bit of a whirlwind,
that's because it was.
It has like a tone that goes into
implementing an ML solution.
It's a process that most often takes
several weeks or months,
don't be scared of it, we've
really just skim the surface.
But now hopefully, you've
enough of foundation of the process.
We talked about the key terms and processes,
key terms and concepts to use.
So the rest of today's events,
you'll be able to really dive deeper into the content,
services, especially tools that most interest you.
So with that I'm Blaine Sundrud,
and I hope you got something good out of today,
and have an excellent rest of the day,
and I'm going go and throw it back to you
all. Have a great day.
Transcript Help Us Translate Interactive Transcript - Enable basic transcript mode by pressing the escape key You may navigate through the transcript using tab. To save a
0 Comments