August 2019
Intermediate to advanced
318 pages
4h 40m
English
Scikit-learn uses the notion of a pipeline. Using the Pipeline class, you can chain together transformers and models, and treat the whole process like a scikit-learn model. You can even insert custom logic.
Here is an example using the tweak_titanic function inside of a pipeline:
>>>fromsklearn.baseimport(...BaseEstimator,...TransformerMixin,...)>>>fromsklearn.pipelineimportPipeline>>>deftweak_titanic(df):...df=df.drop(...columns=[..."name",..."ticket",..."home.dest",..."boat",..."body",..."cabin",...]...).pipe(pd.get_dummies,drop_first=True)...returndf>>>classTitanicTransformer(...BaseEstimator,TransformerMixin...):...deftransform(self,X):...# assumes X is output...# from reading Excel file...X=tweak_titanic(X)...X=X.drop(column="survived")...returnX......deffit(self,X,y):...returnself>>>pipe=Pipeline(...[...("titan",TitanicTransformer()),...("impute",impute.IterativeImputer()),...(..."std",...preprocessing.StandardScaler(),...),...("rf",RandomForestClassifier()),...]...)
With a pipeline in hand, we can call .fit and .score on it:
>>>fromsklearn.model_selectionimport(...train_test_split,...)>>>X_train2,X_test2,y_train2,y_test2=train_test_split(...orig_df,...orig_df.survived,...test_size=0.3,...random_state=42,...)>>>pipe.fit(X_train2,y_train2)>>>pipe.score(X_test2 ...