MLlib is Apache Spark's machine learning library. It is scalable, and consists of many commonly-used machine learning algorithms. Built-in to MLlib are algorithms for:

  • Handling data types in forms of vectors and matrices
  • Computing basic statistics like summary statistics and correlations, as well as producing simple random and stratified samples, and conducting simple hypothesis testing
  • Performing classification and regression modeling
  • Collaborative filtering
  • Clustering
  • Performing dimensionality reduction
  • Conducting feature extraction and transformation
  • Frequent pattern mining
  • Developing optimization
  • Exporting PMML models

The Spark MLlib is still under active development, with new algorithms expected to be added for every new release.

In line with Apache ...

Get Apache Spark Machine Learning Blueprints now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.