February 2018
Intermediate to advanced
262 pages
6h 59m
English
In one-hot encoding, each token is represented by a vector of length N, where N is the size of the vocabulary. The vocabulary is the total number of unique words in the document. Let's take a simple sentence and observe how each token would be represented as one-hot encoded vectors. The following is the sentence and its associated token representation:
An apple a day keeps doctor away said the doctor.
One-hot encoding for the preceding sentence can be represented into a tabular format as follows:
|
An |
100000000 |
|
apple |
010000000 |
|
a |
001000000 |
|
day |
000100000 |
|
keeps |
000010000 |
|
doctor |
000001000 |
|
away |
000000100 |
|
said |
000000010 |
|
the |
000000001 |
This table describes the tokens and their ...