Naive Bayes classifier has been developed using the SMS spam collection data available at http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/. In this chapter, various techniques available in NLP techniques have been discussed to preprocess prior to build the Naive Bayes model:
>>> import csv >>> smsdata = open('SMSSpamCollection.txt','r') >>> csv_reader = csv.reader(smsdata,delimiter='\t')
The following sys package lines code can be used in case of any utf-8 errors encountered while using older versions of Python, or else does not necessary with the latest version of Python 3.6:
>>> import sys >>> reload (sys) >>> sys.setdefaultendocing('utf-8')
Normal coding starts from here as usual:
>>> ...