November 2016
Beginner to intermediate
687 pages
15h 31m
English
The nltk.corpus.package consists of a number of corpus readerclasses that can be used to obtain the contents of various corpora.
Treebank corpus can also be accessed from nltk.corpus. Identifiers for files can be obtained using fileids():
>>> import nltk >>> import nltk.corpus >>> print(str(nltk.corpus.treebank).replace('\\\\','/')) <BracketParseCorpusReader in 'C:/nltk_data/corpora/treebank/combined'> >>> nltk.corpus.treebank.fileids() ['wsj_0001.mrg', 'wsj_0002.mrg', 'wsj_0003.mrg', 'wsj_0004.mrg', 'wsj_0005.mrg', 'wsj_0006.mrg', 'wsj_0007.mrg', 'wsj_0008.mrg', 'wsj_0009.mrg', 'wsj_0010.mrg', 'wsj_0011.mrg', 'wsj_0012.mrg', 'wsj_0013.mrg', 'wsj_0014.mrg', 'wsj_0015.mrg', 'wsj_0016.mrg', 'wsj_0017.mrg', 'wsj_0018.mrg', 'wsj_0019.mrg', ...Read now
Unlock full access