ML
As discussed in the previous section, MLlib uses RDD to work with machine learning algorithms, and, therefore, it brings all the disadvantages of RDD. Spark's ML is another library that makes use of the DataFrame API. All the new features have now been added to the ML library, and MLlib is now kept in maintenance mode. Apart from using structured APIs, Spark's ML lets you define the machine- learning pipeline, which is similar to the pipeline concept in scikit-learn.
Spark ML library provides user-friendly APIs that allow users to combine a set of algorithms into a single pipeline of stages. Each of these stages can perform a separate task (data cleaning, model training, or predicting) and provides an input to the next stage. The pipeline ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access