Training and Exporting Machine Learning Models in Spark

by Hollin Wilkins, Jason Slepicka

Released December 2017

Publisher(s): O'Reilly Media, Inc.

ISBN: 9781491988824

Start your free trial

Video description

Spark ML provides a rich set of tools and models for training, scoring, evaluating, and exporting machine learning models. This video walks you through each step in the process. You’ll explore the basics of Spark’s DataFrames, Transformer, Estimator, Pipeline, and Parameter, and how to utilize the Spark API to create model uniformity and comparability. You'll learn how to create meaningful models and labels from a raw dataset; train and score a variety of models; target price predictions; compare results using MAE, MSE, and other scores; and employ the SparkML evaluator to automate the parameter-tuning process using cross validation. To complete the lesson, you'll learn to export and serialize a Spark trained model as PMML (an industry standard for model serialization), so you can deploy in applications outside the Spark cluster environment.

Gain hands-on experience in training, scoring, evaluating, and exporting machine learning models
Understand how to utilize the Spark API to create model uniformity and comparability
Explore feature extraction, training, scoring, and hyper-parameter tuning using Spark ML
Understand how to use a model trained in Spark and deploy it in other applications

Hollin Wilkins is the cofounder of Combust, Inc., an ML/AI start-up in the SF Bay Area. A data scientist and software engineer formerly with True Car, Hollin has worked with machine learning, high-performance microservices, and software development since 2011.

Jason Slepicka is a senior data engineer with DataScience, where he builds pipelines and data science platform infrastructure. Jason is working on his PhD in Computer Science at the University of Southern California Information Sciences Institute.