Lukas Biewald

Real-world Active Learning

Date: This event took place live on August 21 2014

Presented by: Lukas Biewald

Duration: Approximately 60 minutes.

Cost: Free

Questions? Please send email to


Hosted By: Ben Lorica

Machine learning research is often not applied to real world situations. Often the improvements are small and the increased complexity is high, so except in special situations, industry doesn't take advantage of advances in the academic literature.

Active learning is an example where research proposes a simple strategy that makes a huge difference and almost everyone applying machine learning to real world use cases is doing it or should be doing it. Active learning is the practice of taking cases where the model has low confidence, getting them labeled, and then using those labels as input data.

Webcast attendees will learn simple, practical ways to improve their models by cleaning up and tweaking the distribution of their training data. They will also learn about best practices from real world cases where active learning and data selection took models that were completely unusable in production to extremely effective.

About Lukas Biewald

Lukas Biewald is the founder and CEO of CrowdFlower. Founded in 2009, CrowdFlower is a data enrichment platform that taps into an on-demand to workforce to help companies collect, clean and label data to make it more useful.

Following his graduation from Stanford University with a B.S. in Mathematics and an M.S. in Computer Science, Lukas led the Search Relevance Team for Yahoo! Japan. He then worked as a senior data scientist on the Ranking and Management Team at Powerset, Inc., acquired by Microsoft in 2008. Recently, Lukas won the Netexplorateur Award for GiveWork - a collaboration with Samasource that brings digital work to refugees worldwide. Lukas is also an expert level Go player. Twitter Handle: @l2k

About Ben Lorica

Ben Lorica is the Chief Data Scientist and Director of Content Strategy for Data at O'Reilly Media, Inc.. He has applied Business Intelligence, Data Mining, Machine Learning and Statistical Analysis in a variety of settings including Direct Marketing, Consumer and Market Research, Targeted Advertising, Text Mining, and Financial Engineering. His background includes stints with an investment management company, internet startups, and financial services. He writes regularly about Big Data and Data Science on the O'Reilly Data blog.

You may also be interested in:

Strata Conference + Hadoop World