Appendix B. Appendix B. Machine Learning Quick Reference: Algorithms

Penalized Regression
Common Usage Supervised regression Supervised classification	Common Concerns Missing Values Outliers Standardization Parameter tuning
Suggested Scale Small to large data	Interpretability High
Suggested Usage Modeling linear or linearly separable phenomena Manually specifying nonlinear and explicit interaction terms Well suited for N << p
Naïve Bayes
Common Usage Supervised classification	Common Concerns Strong linear independence assumption Infrequent categorical levels
Suggested Scale Small to extremely large data sets	Interpretability Moderate
Suggested Usage Modeling linearly separable phenomena in large data sets Well-suited for extremely large data sets where complex methods are intractable
Decision Trees
Common Usage Supervised regression Supervised classification	Common Concerns Instability with small training data sets Gradient boosting can be unstable with noise or outliers Overfitting Parameter tuning
Suggested Scale Medium to large data sets	Interpretability Moderate
Suggested Usage Modeling nonlinear and nonlinearly separable phenomena in large, dirty data Interactions considered automatically, but implicitly Missing values and outliers in input variables handled automatically in many implementations Decision tree ensembles, e.g., random forests and gradient boosting, can increase prediction ...

Penalized Regression

Common Usage

Common Concerns

Suggested Scale

Interpretability

Suggested Usage

Naïve Bayes

Common Usage

Common Concerns

Suggested Scale

Interpretability

Suggested Usage

Decision Trees

Common Usage

Common Concerns

Suggested Scale

Interpretability

Suggested Usage

Modeling nonlinear and nonlinearly separable phenomena in large, dirty data
Interactions considered automatically, but implicitly
Missing values and outliers in input variables handled automatically in many implementations
Decision tree ensembles, e.g., random forests and gradient boosting, can increase prediction ...

Get The Evolution of Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

The Evolution of Analytics by Patrick Hall, Wen Phan, Katie Whitson