In this section, we will go through some deeper details of how can we build a Word2Vec model. As we mentioned previously, our final goal is to have a trained model that will able to generate real-valued vector representation for the input textual data which is also called word embeddings.

During the training of the model, we will use the maximum likelihood method (https://en.wikipedia.org/wiki/Maximum_likelihood), which can be used to maximize the probability of the next word *w _{t}* in the input sentence given the previous words that the model has seen, which we can call

*h*.

This maximum likelihood method will be expressed in terms of the softmax function:

Here, the *score* function computes a value to represent the compatibility ...