Chapter 7. Training a Machine Learning Model

In Chapter 5, we learned how to prepare and clean up our data, which is the first step in the machine learning pipeline. Now let’s take a deep dive into how to use our data to train a machine learning model.

Training is often considered the “bulk” of the work in machine learning. Our goal is to create a function (the “model”) that can accurately predict results that it hasn’t seen before. Intuitively, model training is very much like how humans learn a new skill—we observe, practice, correct our mistakes, and gradually improve. In machine learning, we start with an initial model that might not be very good at its job. We then put the model through a series of training steps, where training data is fed to the model. At each training step, we compare the prediction results produced by our model with the true results, and see how well our model performed. We then tinker with the parameters to this model (for example, by changing how much weight is given to each feature) to attempt to improve the model’s accuracy. A good model is one that makes accurate predictions without overfitting to a specific set of inputs.

In this chapter, we are going to learn how to train machine learning models using two different libraries—TensorFlow and Scikit-learn. TensorFlow has native, first-class support in Kubeflow, while Scikit-learn does not. But as we will see in this chapter, both libraries can be easily integrated with Kubeflow. We’ll demonstrate ...

Get Kubeflow for Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.