November 2018
Intermediate to advanced
300 pages
7h 42m
English
Another option to load the documents is through cc.mallet.pipe.iterator.CsvIterator.CsvIterator(Reader, Pattern, int, int, int), which assumes all of the documents are in a single file and returns one instance per line extracted by a regular expression. The class is initialized by the following components:
Consider a text document in the following format, specifying the document name, category, and content:
AP881218 local-news A 16-year-old student ...