13 Robust machine learning with ML Pipelines

This chapter covers

Using transformers and estimators to transform data into ML features
Assembling features into a vector through an ML pipeline
Training a simple ML model
Evaluating a model using relevant performance metrics
Optimizing a model using cross-validation
Interpreting a model’s decision-making process through feature weights

In the previous chapter, we set the stage for machine learning: from a raw data set, we tamed the data and crafted features based on our exploration and analysis of the data. Looking back at the data transformation steps from chapter 12, we performed the following work, resulting in a data frame named food_features:

Read a CSV file containing dish names and multiple ...

Get Data Analysis with Python and PySpark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Data Analysis with Python and PySpark by Jonathan Rioux

13 Robust machine learning with ML Pipelines

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly