Spark ML provides a rich set of tools and models for training, scoring, evaluating, and exporting machine learning models. This video walks you through each step in the process. You’ll explore the basics of Spark’s DataFrames, Transformer, Estimator, Pipeline, and Parameter, and how to utilize the Spark API to create model uniformity and comparability. You'll learn how to create meaningful models and labels from a raw dataset; train and score a variety of models; target price predictions; compare results using MAE, MSE, and other scores; and employ the SparkML evaluator to automate the parameter-tuning process using cross validation. To complete the lesson, you'll learn to export and serialize a Spark trained model as PMML (an industry standard for model serialization), so you can deploy in applications outside the Spark cluster environment.
- Gain hands-on experience in training, scoring, evaluating, and exporting machine learning models
- Understand how to utilize the Spark API to create model uniformity and comparability
- Explore feature extraction, training, scoring, and hyper-parameter tuning using Spark ML
- Understand how to use a model trained in Spark and deploy it in other applications
Hollin Wilkins is the cofounder of Combust, Inc., an ML/AI start-up in the SF Bay Area. A data scientist and software engineer formerly with True Car, Hollin has worked with machine learning, high-performance microservices, and software development since 2011.
Jason Slepicka is a senior data engineer with DataScience, where he builds pipelines and data science platform infrastructure. Jason is working on his PhD in Computer Science at the University of Southern California Information Sciences Institute.
- Title: Training and Exporting Machine Learning Models in Spark
- Release date: December 2017
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491988824
You might also like
Designing Data-Intensive Applications
Data is at the center of many challenges in system design today. Difficult issues need to …
Stream Processing with Apache Spark
Before you can build analytics tools to gain quick insights, you first need to know how …
Distributed Systems in One Lesson
Simple tasks like running a program or storing and retrieving data become much more complicated when …
O'Reilly Strata Data Conference 2019 - New York, New York
The 2019 Strata Data Conference NYC, the biggest Big Data conference in the world, was a …