August 2019
Intermediate to advanced
342 pages
9h 35m
English
As a concluding example, we will show the use of a classifier based on Naive Bayes, using MultinomialNB from the sklearn.naive_bayes module. As usual, we will divide the original dataset consisting of the spam message archive in CSV format, assigning a percentage equal to 30% to the test data subset, and the remaining 70% to the training data subset.
The data will be treated with the bag of words (BoW) technique, which assigns a number to each identified word in the text using CountVectorizer of sklearn, to which we will pass the get_lemmas() method, which returns the individual tokens extracted from the text of the messages.
Finally, we will proceed to normalize and weigh the data using TfidfTransformer ...
Read now
Unlock full access