Now that we understand the backend of how a word vector gets built, let's build word vectors using the skip-gram and CBOW models. To build the model, we will be using the airline sentiment dataset, where tweet texts are given and the sentiments corresponding to the tweets are provided. To generate word vectors, we will be using the gensim package, as follows (the code file is available as word2vec.ipynb in GitHub):
- Install the gensim package:
$pip install gensim
- Import the relevant packages:
import gensimimport pandas as pd
- Read the airline tweets sentiment dataset, which contains comments (text) related to airlines and their corresponding sentiment. The dataset can be obtained from https://d1p17r2m4rzlbo.cloudfront.net/wp-content/uploads/2016/03/Airline-Sentiment-2-w-AA.csv ...