Spark ML provides a rich set of tools and models for training, scoring, evaluating, and exporting machine learning models. This video walks you through each step in the process. You’ll explore the basics of Spark’s DataFrames, Transformer, Estimator, Pipeline, and Parameter, and how to utilize the Spark API to create model uniformity and comparability. You'll learn how to create meaningful models and labels from a raw dataset; train and score a variety of models; target price predictions; compare results using MAE, MSE, and other scores; and employ the SparkML evaluator to automate the parameter-tuning process using cross validation. To complete the lesson, you'll learn to export and serialize a Spark trained model as PMML (an industry standard for model serialization), so you can deploy in applications outside the Spark cluster environment.
- Gain hands-on experience in training, scoring, evaluating, and exporting machine learning models
- Understand how to utilize the Spark API to create model uniformity and comparability
- Explore feature extraction, training, scoring, and hyper-parameter tuning using Spark ML
- Understand how to use a model trained in Spark and deploy it in other applications
Hollin Wilkins is the cofounder of Combust, Inc., an ML/AI start-up in the SF Bay Area. A data scientist and software engineer formerly with True Car, Hollin has worked with machine learning, high-performance microservices, and software development since 2011.
Jason Slepicka is a senior data engineer with DataScience, where he builds pipelines and data science platform infrastructure. Jason is working on his PhD in Computer Science at the University of Southern California Information Sciences Institute.
- Title: Training and Exporting Machine Learning Models in Spark
- Release date: December 2017
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491988824
You might also like
Strata Data Conference 2019 - London, United Kingdom
The Strata Data Conference, the world's largest gathering of the data community, came to London April …
Designing Data-Intensive Applications
Data is at the center of many challenges in system design today. Difficult issues need to …
Expanded Edition (August 2018) Updated with Design Patterns episodes from the Clean Code series from Clean …
Python Programming Language
6+ Hours of Video Instruction Python Programming Language LiveLessons provides developers with a guided tour of …