In both the CBOW and skip-gram models, we used the softmax function for computing the probability of the occurrence of a word. But computing the probability using the softmax function is computationally expensive. Say, we are building a CBOW model; we compute the probability of the word in our vocabulary to be the target word as:
If you look at the preceding equation, we are basically driving the exponent of the with the ...