One-hot encoding

In this context, an encoding technique is a method of converting the values of a dataset or of an attribute into a dataset so that data analysis techniques and machine learning models can process them more easily. One-hot encoding is a method that is to be used on categorical data.

Let's discuss the theory of one-hot encoding first. Say we have a simple table with the following data:

Sample tabular – categorical data

Now, the string data in the City column is not very machine-friendly. There are, of course, a number of machine learning models that will have no problem processing this attribute (for example, random forests), ...

Get Hands-On Application Development with PyCharm now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.