Appendix B. Appendix B. Machine Learning Quick Reference: Algorithms

Penalized Regression

Common Usage

  • Supervised regression
  • Supervised classification

Common Concerns

  • Missing Values
  • Outliers
  • Standardization
  • Parameter tuning

Suggested Scale

  • Small to large data

Interpretability

  • High

Suggested Usage

  • Modeling linear or linearly separable phenomena
  • Manually specifying nonlinear and explicit interaction terms
  • Well suited for N << p

Naïve Bayes

Common Usage

  • Supervised classification

Common Concerns

  • Strong linear independence assumption
  • Infrequent categorical levels

Suggested Scale

  • Small to extremely large data sets

Interpretability

  • Moderate

Suggested Usage

  • Modeling linearly separable phenomena in large data sets
  • Well-suited for extremely large data sets where complex methods are intractable

Decision Trees

Common Usage

  • Supervised regression
  • Supervised classification

Common Concerns

  • Instability with small training data sets
  • Gradient boosting can be unstable with noise or outliers
  • Overfitting
  • Parameter tuning

Suggested Scale

  • Medium to large data sets

Interpretability

  • Moderate

Suggested Usage

  • Modeling nonlinear and nonlinearly separable phenomena in large, dirty data
  • Interactions considered automatically, but implicitly
  • Missing values and outliers in input variables handled automatically in many implementations
  • Decision tree ensembles, e.g., random forests and gradient boosting, can increase prediction ...

Get The Evolution of Analytics now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.