Best practice 7 – deciding on whether or not to encode categorical features

If a feature is considered categorical, we need to decide whether we should encode it. This depends on what prediction algorithm(s) we will use in later stages. Naïve Bayes and tree-based algorithms can directly work with categorical features, while other algorithms in general cannot, in which case, encoding is essential.

As the output of the feature generation stage is the input of the model training stage, steps taken in the feature generation stage should be compatible with the prediction algorithm. Therefore, we should look at two stages of feature generation and predictive model training as a whole, instead of two isolated components. The following practical ...

Get Python Machine Learning By Example - Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.