In this section, we will go through some deeper details of how can we build a Word2Vec model. As we mentioned previously, our final goal is to have a trained model that will able to generate real-valued vector representation for the input textual data which is also called word embeddings.
During the training of the model, we will use the maximum likelihood method (https://en.wikipedia.org/wiki/Maximum_likelihood), which can be used to maximize the probability of the next word wt in the input sentence given the previous words that the model has seen, which we can call h.
This maximum likelihood method will be expressed in terms of the softmax function:
Here, the score function computes a value to represent the compatibility ...