One hot encoding

The concept is pretty simple.  Instead of replacing a category with an enumeration, we add one column to our data for each value, and set it to be a 1 or 0 based on that value.  The name comes from the fact that only one column in the set is "hot" or selected.

We can apply this principle to our example.  We can replace the one column, “Material”, with four columns for each material type in our database.  So our column, "Material", becomes "ceramic", "fur", "metal", "plastic", and "wood":

Material

ceramic

fur

metal

plastic

wood

metal

0

0

1

0

0

metal

0

0

1

0

0

metal

0

0

1

0

0

metal

0

0

1

0

0

metal

0

0

1

0

0

fur

0

1

0

0

0

Get Artificial Intelligence for Robotics now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.