Skip to Content
Deploying Spark ML Pipelines in Production on AWS
on-demand course

Deploying Spark ML Pipelines in Production on AWS

with Jason Slepicka
December 2017
Advanced content levelAdvanced
23m
English
O'Reilly Media, Inc.
Closed Captioning available in German, English, Spanish, French, Japanese, Korean, Portuguese (Portugal, Brazil), Chinese (Simplified), Chinese (Traditional)

Overview

Translating a Spark application from running in a local environment to running on a production cluster in the cloud requires several critical steps, including publishing artifacts, installing dependencies, and defining the steps in a pipeline. This video is a hands-on guide through the process of deploying your Spark ML pipelines in production. You’ll learn how to create a pipeline that supports model reproducibility—making your machine learning models more reliable—and how to update your pipeline incrementally as the underlying data change. Learners should have basic familiarity with the following: Scala or Python; Hadoop, Spark, or Pandas; SBT or Maven; Amazon Web Services such as S3, EMR, and EC2; Bash, Docker, and REST.

  • Understand how various cloud ecosystem components interact (i.e., Amazon S3, EMR, EC2, and so on)
  • Learn how to architect the components of a cloud ecosystem into an end-to-end model pipeline
  • Explore the capabilities and limitations of Spark in building an end-to-end model pipeline
  • Learn to write, publish, deploy, and schedule an ETL process using Spark on AWS using EMR
  • Understand how to create a pipeline that supports model reproducibility and reliability

Jason Slepicka is a senior data engineer with Los Angeles based DataScience, where he builds pipelines and data science platform infrastructure. He has a decade of experience integrating data to support efforts like fighting human trafficking for DARPA, exploring the evolution of evolvability in yeast, and tracking intruders in computer networks. Jason has both a Bachelor's and Master’s in Computer Science from the University of Arizona and is working on his PhD in Computer Science at the University of Southern California Information Sciences Institute.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

MLOps for Containers with AWS and GCP

MLOps for Containers with AWS and GCP

Alfredo Deza, Noah Gift
Open source at AWS

Open source at AWS

Adrian Cockcroft

Publisher Resources

ISBN: 9781491988879