July 2017
Intermediate to advanced
360 pages
8h 26m
English
Another interesting class provided by scikit-learn is FeatureUnion, which allows concatenating different feature transformations into a single output matrix. The main difference with a pipeline (which can also include a feature union) is that the pipeline selects from alternative scenarios, while a feature union creates a unified dataset where different preprocessing outcomes are joined together. For example, considering the previous results, we could try to optimize our dataset by performing a PCA with 10 components joined with the selection of the best 5 features chosen according to the ANOVA metric. In this way, the dimensionality is reduced to 15 instead of 20:
from sklearn.pipeline import FeatureUnion>>> steps_fu = [ ...
Read now
Unlock full access