Skip to Content
Machine Learning Pocket Reference
book

Machine Learning Pocket Reference

by Matt Harrison
August 2019
Intermediate to advanced
318 pages
4h 40m
English
O'Reilly Media, Inc.
Book available
Content preview from Machine Learning Pocket Reference

Chapter 19. Pipelines

Scikit-learn uses the notion of a pipeline. Using the Pipeline class, you can chain together transformers and models, and treat the whole process like a scikit-learn model. You can even insert custom logic.

Classification Pipeline

Here is an example using the tweak_titanic function inside of a pipeline:

>>> from sklearn.base import (
...     BaseEstimator,
...     TransformerMixin,
... )
>>> from sklearn.pipeline import Pipeline

>>> def tweak_titanic(df):
...     df = df.drop(
...         columns=[
...             "name",
...             "ticket",
...             "home.dest",
...             "boat",
...             "body",
...             "cabin",
...         ]
...     ).pipe(pd.get_dummies, drop_first=True)
...     return df

>>> class TitanicTransformer(
...     BaseEstimator, TransformerMixin
... ):
...     def transform(self, X):
...         # assumes X is output
...         # from reading Excel file
...         X = tweak_titanic(X)
...         X = X.drop(column="survived")
...         return X
...
...     def fit(self, X, y):
...         return self

>>> pipe = Pipeline(
...     [
...         ("titan", TitanicTransformer()),
...         ("impute", impute.IterativeImputer()),
...         (
...             "std",
...             preprocessing.StandardScaler(),
...         ),
...         ("rf", RandomForestClassifier()),
...     ]
... )

With a pipeline in hand, we can call .fit and .score on it:

>>> from sklearn.model_selection import (
...     train_test_split,
... )
>>> X_train2, X_test2, y_train2, y_test2 = train_test_split(
...     orig_df,
...     orig_df.survived,
...     test_size=0.3,
...     random_state=42,
... )

>>> pipe.fit(X_train2, y_train2)
>>> pipe.score(X_test2 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Practical Simulations for Machine Learning

Practical Simulations for Machine Learning

Paris Buttfield-Addison, Mars Buttfield-Addison, Tim Nugent, Jon Manning

Publisher Resources

ISBN: 9781492047537Errata PageSupplemental Content