We will begin the modeling phase using the following steps:

  1. We will start by importing the dataset and processing the text using the following lines of code:
import cc.mallet.types.*; 
import cc.mallet.pipe.*; 
import cc.mallet.pipe.iterator.*; 
import cc.mallet.topics.*; 
import java.util.*; 
import java.util.regex.*; 
public class TopicModeling { 
  public static void main(String[] args) throws Exception { 
String dataFolderPath = "data/bbc"; 
String stopListFilePath = "data/stoplists/en.txt"; 
  1. We will then create a default pipeline  object as previously described:
ArrayList<Pipe> pipeList = new ArrayList<Pipe>(); pipeList.add(new Input2CharSequence("UTF-8")); Pattern tokenPattern = Pattern.compile("[\\p{L}\\p{N}_]+"); ...

Get Machine Learning in Java - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.