Depending on your requirements, there are multiple ways in which you can build machine learning models, using preexisting libraries, such as Python’s scikit-learn, R, and TensorFlow. However, what makes Spark’s Machine Learning library (MLlib) really useful is its ability to train models on scale and provide distributed training. This allows users to quickly build models on a huge dataset, in addition to preprocessing and preparing workflows with the Spark framework itself.
This chapter focuses on how to leverage MLlib for building and applying various machine learning models. The first ...