Machine learning with Spark

At this point in the chapter, we arrived at the main task of your job: creating a model to predict one or multiple attributes being missing in the dataset. For this task, we can use some machine learning modeling, and Spark can give us a big hand in this context.

MLlib is the Spark machine learning library; although it is built in Scala and Java, its functions are also available in Python. It contains classification, regression, recommendation algorithms, some routines for dimensionality reduction and feature selection, and it has lots of functionalities for text processing. All of them are able to cope with huge datasets, and use the power of all the nodes in the cluster to achieve their goal.

As of now, it's ...

Get Python Data Science Essentials - Third Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.