Chapter 6. Memorization methods

This chapter covers

  • Building single-variable models
  • Cross-validated variable selection
  • Building basic multivariable models
  • Starting with decision trees, nearest neighbor, and naive Bayes models

The simplest methods in data science are what we call memorization methods. These are methods that generate answers by returning a majority category (in the case of classification) or average value (in the case of scoring) of a subset of the original training data. These methods can vary from models depending on a single variable (similar to the analyst’s pivot table), to decision trees (similar to what are called business rules), to nearest neighbor and Naive Bayes methods.[1] In this chapter, you’ll learn how to ...

Get Practical Data Science with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.