MLlib is Apache Spark's machine learning library. It is scalable, and consists of many commonly-used machine learning algorithms. Built-in to MLlib are algorithms for:
- Handling data types in forms of vectors and matrices
- Computing basic statistics like summary statistics and correlations, as well as producing simple random and stratified samples, and conducting simple hypothesis testing
- Performing classification and regression modeling
- Collaborative filtering
- Performing dimensionality reduction
- Conducting feature extraction and transformation
- Frequent pattern mining
- Developing optimization
- Exporting PMML models
The Spark MLlib is still under active development, with new algorithms expected to be added for every new release.
In line with Apache ...