October 2018
Intermediate to advanced
252 pages
6h 49m
English
Keras provides the one_hot() function, which you can use to tokenize and encode a text document. It does not create one-hot encoding but, instead, the function performs a hashing_trick() function. The hashing trick converts text into a sequence of indexes in a fixed-size hashing space:
vocab_size = 50encodeDocuments = [one_hot(doc, vocab_size) for doc in documents]print(encodeDocuments)
The output of the preceding code is as follows:
[[1, 39], [37, 40], [21, 19], [5, 40], [16], [36], [8, 19], [25, 37], [8, 40], [25, 44, 39, 26]]
Where the Well Done! and Good Work documents are represented by vectors [1, 39] [37,40] respectively. Also, you can observe ...