Hierarchical softmax

In hierarchical softmax, instead of mapping each output vector to its corresponding word, we consider the output vector as a form of binary tree. Refer to the structure of hierarchical softmax in Figure 6.34:

Figure 6.34: Hierarchical Softmax structure

So, here, the output vector is not making a prediction about how probable the word is, but it is making a prediction about which way you want to go in the binary tree. So, either you want to visit this branch or you want to visit the other branch. Refer to Figure 6.35:

Get Python Natural Language Processing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.