Converting categorical features to numerical – one-hot encoding and ordinal encoding

In the previous chapter, Predicting Online Ads Click-through with Tree-Based Algorithms, we mentioned how one-hot encoding transforms categorical features to numerical features in order to be used in the tree algorithms in scikit-learn and TensorFlow. This will not limit our choice to tree-based algorithms if we can adopt one-hot encoding to any other algorithms that only take in numerical features.

The simplest solution we can think of in terms of transforming a categorical feature with k possible values is to map it to a numerical feature with values from 1 to k. For example, [Tech, Fashion, Fashion, Sports, Tech, Tech, Sports] becomes [1, 2, 2, 3, 1, 1, ...

Get Python Machine Learning By Example - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.