O'Reilly logo

Scala Machine Learning Projects by Md. Rezaul Karim

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Step 2 - Creating vocabulary and tokens count to train the LDA after text pre-processing

The run() method takes params such as input text, predefined vocabulary size, and stop word file:

def run(params: Params)

Then, it starts text pre-processing for the LDA model as follows (that is, inside the run method):

// Load documents, and prepare them for LDA.val preprocessStart = System.nanoTime()val (corpus, vocabArray, actualNumTokens) = preprocess(params.input, params.vocabSize, params.stopwordFile)  

The Params case class is used to define the parameters to train the LDA model. This goes as follows:

//Setting the parameters before training the LDA modelcase class Params(var input: String = "", var ldaModel: LDAModel = null,    k: Int = 5, maxIterations: ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required