February 2018
Intermediate to advanced
378 pages
10h 14m
English
Note the (u'Paris', 'NNP'), (u'Exposition', 'NNP'), (u'Americans, NNPS). NNP stands for proper noun, NNPS proper noun plural. We need to get rid of all capital letters from non-proper nouns and from all punctuation marks and numbers:
In [25]: # tags_to_delete = ['$', "''", "(", ")", ",", "--", ".", ":", "CC"] tags_to_not_lowercase = set(['NNP', 'NNPS']) tags_to_preserve = set(['JJ', 'JJR', 'JJS', 'NN', 'NNP', 'NNPS', 'NNS', 'RB', 'RBR', 'RBS','UH', 'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ']) In [26]: print(pos_sentences[203]) [(u'Everybody', 'NN'), (u'wa', 'VBZ'), (u'going', 'VBG'), (u'to', 'TO'), (u'the', 'DT'), (u'famous', 'JJ'), (u'Paris', 'NNP'), (u'Exposition', 'NNP'), (u'--', ':'), (u'I', 'PRP'), (u',', ...Read now
Unlock full access