O'Reilly logo

Doing Data Science by Rachel Schutt, Cathy O'Neil

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 13. Lessons Learned from Data Competitions: Data Leakage and Model Evaluation

The contributor for this chapter is Claudia Perlich. Claudia has been the Chief Scientist at Media 6 Degrees (M6D) for the past few years. Before that she was in the data analytics group at the IBM center that developed Watson, the computer that won Jeopardy! (although she didn’t work on that project). Claudia holds a master’s in computer science, and got her PhD in information systems at NYU. She now teaches a class to business students on data science, where she addresses how to assess data science work and how to manage data scientists.

Claudia is also a famously successful data mining competition winner. She won the KDD Cup in 2003, 2007, 2008, and 2009, the ILP Challenge in 2005, the INFORMS Challenge in 2008, and the Kaggle HIV competition in 2010.

More recently she’s turned toward being a data mining competition organizer, first for the INFORMS Challenge in 2009, and then for the Heritage Health Prize in 2011. Claudia claims to be retired from competition. Fortunately for the class, she provided some great insights into what can be learned from data competitions. From the many competitions she’s done, she’s learned quite a bit in particular about data leakage, and how to evaluate the models she comes up with for the competitions.

Claudia’s Data Scientist Profile

Claudia started by asking what people’s reference point might be to evaluate where they stand with their own data science profile ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required