To use machine learning, we need a physical use caseāand there are millions of them. It can be soccer players running down a field, people walking into a store, the main word searched on your website today, or any other physical use case. What we need to do is collect data on that given particular use case, and as I mentioned, we can have the data natively or maybe stream it into a data lake. There are many options.
We then need to combine that data set with some kind of machine learning model, and we do not necessarily know which model we are going to use upfront. There will be some significant experimentation that will go into the process.
And flipping this over, if our data set is too large and the model is not large enough to handle this, then we will see underfitting.
In the image below are the supported algorithms built into Amazon Sagemaker.
Algorithms are standardized methods used to train models. A model is a function that maps inputs to a set of predicted outcomes using algorithms. Existing data is then used to build a function using rules, and this is called training. With training, we can ensure that machine learning is applicable to real-world use cases and will provide valuable insights.