Evan Sparks

KeystoneML: Optimized large-scale machine-learning pipelines on Apache Spark

Date: This event took place live on May 17 2016

Presented by: Evan Sparks

Duration: Approximately 60 minutes.

Questions? Please send email to

Description:

Moderated By: Ben Lorica

KeystoneML is an open source software framework developed by the AMPLab for building large-scale machine-learning pipelines that run on Apache Spark. Evan Sparks describes the principles behind KeystoneML and introduces its programming model by way of example pipelines in NLP and image classification. Using these examples, Evan outlines the optimizations that KeystoneML makes to increase training throughput while preserving correctness and presents end-to-end results that demonstrate the scalability of the system to hundreds of nodes.

After this webcast, you'll have learned:

  • The KeystoneML programming model.
  • How to work with KeystoneML to construct new pipelines.
  • How salient aspects of the KeystoneML optimizer work.
  • How KeystoneML achieves high performance and scalable model training while maintaining a high-level programming interface.

About Evan Sparks

Evan Sparks is a PhD student in computer science at UC Berkeley working in the AMPLab. Evan's research focuses on the design and implementation of distributed systems for large-scale data analysis and machine learning. Prior to Berkeley, he spent several years tackling large-scale data problems as a quantitative financial analyst at MDT Advisers and a product engineer at Recorded Future. He holds a bachelor's degree from Dartmouth College and a master's degree in computer science from UC Berkeley. Twitter: @evanrsparks

About Ben Lorica

Ben Lorica is the Chief Data Scientist and Director of Content Strategy for Data at O'Reilly Media, Inc. He has applied Business Intelligence, Data Mining, Machine Learning and Statistical Analysis in a variety of settings including Direct Marketing, Consumer and Market Research, Targeted Advertising, Text Mining, and Financial Engineering. His background includes stints with an investment management company, internet startups, and financial services.