O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Summary

In this chapter, we took a deeper look into more complex text processing and explored MLlib's text feature extraction capabilities, in particular the tf-idf term weighting schemes. We covered examples of using the resulting tf-idf feature vectors to compute document similarity and train a newsgroup topic classification model. Finally, you learned how to use MLlib's cutting-edge Word2Vec model to compute a vector representation of words in a corpus of text and use the trained model to find words with contextual meaning that is similar to a given word. We also looked at using Word2Vec with Spark ML

In the next chapter, we will take a look at online learning, and you will learn how Spark Streaming relates to online learning models.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required