November 2016
Beginner to intermediate
687 pages
15h 31m
English
We can also perform the analysis of performance at word level or lexical level.
Consider the following code in NLTK in which movie reviews have been taken and marked as either positive or negative. A feature extractor is constructed that checks whether a given word is present in a document or not:
>>> from nltk.corpus import movie_reviews >>> docs = [(list(movie_reviews.words(fileid)), category) ... for category in movie_reviews.categories() ... for fileid in movie_reviews.fileids(category)] >>> random.shuffle(docs) all_wrds = nltk.FreqDist(w.lower() for w in movie_reviews.words()) word_features = list(all_wrds)[:2000] def doc_features(doc): doc_words = set(doc) features = {} for word in word_features: features['contains({})'.format(word)] ...Read now
Unlock full access