July 2017
Intermediate to advanced
360 pages
8h 26m
English
A subset of the Gutenberg project is provided and can be freely accessed in this way:
from nltk.corpus import gutenberg>>> print(gutenberg.fileids())[u'austen-emma.txt', u'austen-persuasion.txt', u'austen-sense.txt', u'bible-kjv.txt', u'blake-poems.txt', u'bryant-stories.txt', u'burgess-busterbrown.txt', u'carroll-alice.txt', u'chesterton-ball.txt', u'chesterton-brown.txt', u'chesterton-thursday.txt', u'edgeworth-parents.txt', u'melville-moby_dick.txt', u'milton-paradise.txt', u'shakespeare-caesar.txt', u'shakespeare-hamlet.txt', u'shakespeare-macbeth.txt', u'whitman-leaves.txt']
A single document can be accessed as a raw version or split into sentences or words:
>>> print(gutenberg.raw('milton-paradise.txt'))[Paradise ...Read now
Unlock full access