Skip to Content
Practical Statistics for Data Scientists, 2nd Edition
book

Practical Statistics for Data Scientists, 2nd Edition

by Peter Bruce, Andrew Bruce, Peter Gedeck
May 2020
Beginner
360 pages
9h 16m
English
O'Reilly Media, Inc.
Book available
Content preview from Practical Statistics for Data Scientists, 2nd Edition

Chapter 6. Statistical Machine Learning

Recent advances in statistics have been devoted to developing more powerful automated techniques for predictive modeling—both regression and classification. These methods, like those discussed in the previous chapter, are supervised methods—they are trained on data where outcomes are known and learn to predict outcomes in new data. They fall under the umbrella of statistical machine learning and are distinguished from classical statistical methods in that they are data-driven and do not seek to impose linear or other overall structure on the data. The K-Nearest Neighbors method, for example, is quite simple: classify a record in accordance with how similar records are classified. The most successful and widely used techniques are based on ensemble learning applied to decision trees. The basic idea of ensemble learning is to use many models to form a prediction, as opposed to using just a single model. Decision trees are a flexible and automatic technique to learn rules about the relationships between predictor variables and outcome variables. It turns out that the combination of ensemble learning with decision trees leads to some of the best performing off-the-shelf predictive modeling techniques.

The development of many of the techniques in statistical machine learning can be traced back to the statisticians Leo Breiman (see Figure 6-1) at the University of California at Berkeley and Jerry Friedman at Stanford University. Their work, along ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Practical Statistics for Data Scientists

Practical Statistics for Data Scientists

Peter Bruce, Andrew Bruce
Fundamentals of Data Engineering

Fundamentals of Data Engineering

Joe Reis, Matt Housley
Fundamentals of Data Engineering

Fundamentals of Data Engineering

Joe Reis, Matt Housley

Publisher Resources

ISBN: 9781492072935Errata PageSupplemental Content