Using Pipelines for multiple preprocessing steps
Pipelines are (at least to me) something I don't think about using often, but are useful. They can be used to tie together many steps into one object. This allows for easier tuning and better access to the configuration of the entire model, not just one of the steps.
This is the first section where we'll combine multiple data processing steps into a single step. In scikit-learn, this is known as a Pipeline. In this section, we'll first deal with missing data via imputation; however, after that, we'll scale the data to get a mean of zero and a standard deviation of one.
Let's create a dataset that is missing some values, and then we'll look at how to create a Pipeline:
>>> from sklearn ...