We have seen the power of scikit-learn in this book, and this chapter will be no different. Let's import the CountVectorizer module to quickly count the occurrences of phrases in our text:
# The CountVectorizer is from sklearn's text feature extraction module# the feature extraction module as a whole contains many tools built for extracting features from data. # Earlier, we manually extracted data by applying functions such as num_caps, special_characters, and so on# The CountVectorizer module specifically is built to quickly count occurrences of phrases within pieces of textfrom sklearn.feature_extraction.text import CountVectorizer
We will start by simply creating an instance of CountVectorizer with ...