December 2018
Beginner to intermediate
684 pages
21h 9m
English
Each filing is a separate text file and a master index contains filing metadata. We extract the most informative sections, namely, the following:
The notebook preprocessing shows how to parse and tokenize the text using spaCy, similar to the approach taken in Chapter 14, Topic Modeling. We do not lemmatize the tokens to preserve the nuances of word usage.