Before we build our model, we need to prepare our data so it can be provided to our model. We want a feature vector and a class label. In our case, the class label can take two values, positive or negative depending on if the sentence has a positive or a negative sentiment. Words are our features. We will use the bag-of-words model to represent our text as features. In a bag-words-model, the following steps are performed to transform a text into a feature vector:
- Extract all unique individual words from the text dataset. We call a text dataset a corpus.
- Process the words. Processing typically involves removing numbers and other characters, placing the words in lowercase, stemming the words, and removing unnecessary white ...