Sometimes, if we want to use text in Machine Learning algorithms, we’ll have to convert them into a numerical representation. We know that computers are very good at handling numbers. We convert text into a numerical representation called a feature vector. A vector can be as simple as a list of numbers. The bag-of-words model is one of the feature-extraction algorithms for text. We can use this package to generate a bag of words.
For that, we need to use sklearn from Python:
from sklearn.feature_extraction.text import CountVectorizer
We are going to use CountVectorizer to create the bag of words:
corpus = []len(text.sentences)for sentence in text.sentences:corpus.append(str(sentence)) vectorizer = CountVectorizer()print( vectorizer.fit_transform(corpus).todense() ...