December 2018
Beginner to intermediate
684 pages
21h 9m
English
N-grams combine N consecutive tokens. N-grams can be useful for the BoW model because, depending on the textual context, treating something such as data scientist as a single token may be more meaningful than treating it as two distinct tokens: data and scientist.
textacy makes it easy to view the ngrams of a given length n occurring with at least min_freq times:
from textacy.extract import ngramspd.Series([n.text for n in ngrams(doc, n=2, min_freq=2)]).value_counts()East Asia 2Asia Earthquake 2Tsunami Blog 2annual Bloggies 2