January 2018
Intermediate to advanced
470 pages
11h 9m
English
For this mini deployment, let's use a real-life dataset: PubMed. A sample dataset containing PubMed terms can be downloaded from: https://nlp.stanford.edu/software/tmt/tmt-0.4/examples/pubmed-oa-subset.csv. This link actually contains a dataset in CSV format but has a strange name, 4UK1UkTX.csv.
To be more specific, the dataset contains some abstracts of some biological articles, their publication year, and the serial number. A glimpse is given in the following figure:

In the following code, we have already saved the trained LDA model for future use as follows:
params.ldaModel.save(spark.sparkContext, ...
Read now
Unlock full access