O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Word2Vec with Spark ML on the 20 Newsgroups dataset

In this section, we look at how to use the Spark ML DataFrame and newer implementations from Spark 2.0.X to create a Word2Vector model.

We will create a DataFrame from the dataSet:

val spConfig = (new   SparkConf).setMaster("local").setAppName("SparkApp")val spark = SparkSession  .builder  .appName("Word2Vec Sample").config(spConfig)  .getOrCreate()import spark.implicits._val rawDF = spark.sparkContext  .wholeTextFiles("./data/20news-bydate-train/alt.atheism/*")  val temp = rawDF.map( x => {    (x._2.filter(_ >= ' ').filter(! _.toString.startsWith("(")) )    })  val textDF = temp.map(x => x.split(" ")).map(Tuple1.apply)    .toDF("text")

This will be followed by creating the Word2Vec class and training ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required