Practical Data Science with R

Chapter 6. Memorization methods

This chapter covers

Building single-variable models
Cross-validated variable selection
Building basic multivariable models
Starting with decision trees, nearest neighbor, and naive Bayes models

The simplest methods in data science are what we call memorization methods. These are methods that generate answers by returning a majority category (in the case of classification) or average value (in the case of scoring) of a subset of the original training data. These methods can vary from models depending on a single variable (similar to the analyst’s pivot table), to decision trees (similar to what are called business rules), to nearest neighbor and Naive Bayes methods.^[1] In this chapter, you’ll learn how to ...

Get Practical Data Science with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Practical Data Science with R by Nina Zumel, John Mount

Chapter 6. Memorization methods

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly