O'Reilly logo

Hands-On Natural Language Processing with Python by Rajalingappaa Shanmugamani, Rajesh Arumugam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

One-hot encoding

One-hot encoding is a vectorization technique for labeled data, especially categorical data. In the case of binary labels, target variables will be presented as [0, 1], [1, 0]. The same representation for three classes will appear as [0, 0, 1], [0, 1, 0], [1, 0, 0]. This type of representation can support any number of categories. The main advantage of one-hot encoding is that it treats all categorical data equally, in contrast to arbitrary categorical labels. For instance, categories to represent colors such as red, green, and blue, may use integers such as 0, 1, and 2. Although there is no intrinsic order for colors, some ML models may treat such input as if it has an order. This is avoided in one-hot encoding, as it does ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required