KeystoneML: Optimized large-scale machine-learning pipelines on Apache Spark
Date: This event took place live on May 17 2016
Presented by: Evan Sparks
Duration: Approximately 60 minutes.
Questions? Please send email to
Moderated By: Ben Lorica
KeystoneML is an open source software framework developed by the AMPLab for building large-scale machine-learning pipelines that run on Apache Spark. Evan Sparks describes the principles behind KeystoneML and introduces its programming model by way of example pipelines in NLP and image classification. Using these examples, Evan outlines the optimizations that KeystoneML makes to increase training throughput while preserving correctness and presents end-to-end results that demonstrate the scalability of the system to hundreds of nodes.
After this webcast, you'll have learned:
About Evan Sparks
Evan Sparks is a PhD student in computer science at UC Berkeley working in the AMPLab. Evan's research focuses on the design and implementation of distributed systems for large-scale data analysis and machine learning. Prior to Berkeley, he spent several years tackling large-scale data problems as a quantitative financial analyst at MDT Advisers and a product engineer at Recorded Future. He holds a bachelor's degree from Dartmouth College and a master's degree in computer science from UC Berkeley. Twitter: @evanrsparks
About Ben Lorica
Ben Lorica is the Chief Data Scientist and Director of Content Strategy for Data at O'Reilly Media, Inc. He has applied Business Intelligence, Data Mining, Machine Learning and Statistical Analysis in a variety of settings including Direct Marketing, Consumer and Market Research, Targeted Advertising, Text Mining, and Financial Engineering. His background includes stints with an investment management company, internet startups, and financial services.