Skip to Content
Statistics for Machine Learning
book

Statistics for Machine Learning

by Pratap Dangeti
July 2017
Beginner to intermediate
442 pages
10h 8m
English
Packt Publishing
Content preview from Statistics for Machine Learning

Naive Bayes SMS spam classification example

Naive Bayes classifier has been developed using the SMS spam collection data available at http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/. In this chapter, various techniques available in NLP techniques have been discussed to preprocess prior to build the Naive Bayes model:

>>> import csv 
 
>>> smsdata = open('SMSSpamCollection.txt','r') 
>>> csv_reader = csv.reader(smsdata,delimiter='\t') 

The following sys package lines code can be used in case of any utf-8 errors encountered while using older versions of Python, or else does not necessary with latest version of Python 3.6:

>>> import sys 
>>> reload (sys) 
>>> sys.setdefaultendocing('utf-8') 

Normal coding starts from here as usual:

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Probability and Statistics for Machine Learning

Probability and Statistics for Machine Learning

Jon Krohn

Publisher Resources

ISBN: 9781788295758Supplemental Content