November 2018
Intermediate to advanced
300 pages
7h 42m
English
We will begin the modeling phase using the following steps:
import cc.mallet.types.*;
import cc.mallet.pipe.*;
import cc.mallet.pipe.iterator.*;
import cc.mallet.topics.*;
import java.util.*;
import java.util.regex.*;
import java.io.*;
public class TopicModeling {
public static void main(String[] args) throws Exception {
String dataFolderPath = "data/bbc";
String stopListFilePath = "data/stoplists/en.txt";
ArrayList<Pipe> pipeList = new ArrayList<Pipe>(); pipeList.add(new Input2CharSequence("UTF-8")); Pattern tokenPattern = Pattern.compile("[\\p{L}\\p{N}_]+"); ...