Feature extraction with scikit-learn

We have seen the power of scikit-learn in this book, and this chapter will be no different. Let's import the CountVectorizer module to quickly count the occurrences of phrases in our text:

# The CountVectorizer is from sklearn's text feature extraction module# the feature extraction module as a whole contains many tools built for extracting features from data. # Earlier, we manually extracted data by applying functions such as num_caps, special_characters, and so on# The CountVectorizer module specifically is built to quickly count occurrences of phrases within pieces of textfrom sklearn.feature_extraction.text import CountVectorizer

We will start by simply creating an instance of CountVectorizer with ...

Get Hands-On Machine Learning for Cybersecurity now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.