© Butch Quinto 2020
B. QuintoNext-Generation Machine Learning with Sparkhttps://doi.org/10.1007/978-1-4842-5669-5_2

2. Introduction to Spark and Spark MLlib

Butch Quinto1 
(1)
Carson, CA, USA
 

Simple models and a lot of data trump more elaborate models based on less data.

—Peter Norvigi

Spark is a unified big data processing framework for processing and analyzing large datasets. Spark provides high-level APIs in Scala, Python, Java, and R with powerful libraries including MLlib for machine learning, Spark SQL for SQL support, Spark Streaming for real-time streaming, and GraphX for graph processing.ii Spark was founded by Matei Zaharia at the University of California, Berkeley’s AMPLab and was later donated to the Apache Software Foundation, becoming ...

Get Next-Generation Machine Learning with Spark: Covers XGBoost, LightGBM, Spark NLP, Distributed Deep Learning with Keras, and More now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.