Hierarchical softmax

In both the CBOW and skip-gram models, we used the softmax function for computing the probability of the occurrence of a word. But computing the probability using the softmax function is computationally expensive. Say, we are building a CBOW model; we compute the probability of the word in our vocabulary to be the target word as:

If you look at the preceding equation, we are basically driving the exponent of the with the ...

Get Hands-On Deep Learning Algorithms with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.