One hot encoding

The concept is pretty simple. Instead of replacing a category with an enumeration, we add one column to our data for each value, and set it to be a 1 or 0 based on that value. The name comes from the fact that only one column in the set is "hot" or selected.

We can apply this principle to our example. We can replace the one column, “Material”, with four columns for each material type in our database. So our column, "Material", becomes "ceramic", "fur", "metal", "plastic", and "wood":

Material

ceramic

fur

metal

plastic

wood

metal

0

0

1

0

0

metal

0

0

1

0

0

metal

0

0

1

0

0

metal

0

0

1

0

0

metal

0

0

1

0

0

fur

0

1

0

0

0

fur

Get Artificial Intelligence for Robotics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.