O'Reilly logo

Introduction to Machine Learning with R by Scott V. Burger

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 3. Sampling Statistics and Model Training in R

Sampling and machine learning go hand in hand. In machine learning, we typically begin with a big dataset that we want to use for predicting something. We usually split this data into a training set and build a model around that, and then unleash a fully trained model on some kind of test set to see what the final output is. In some instances, it might be very difficult to run a machine learning model on an entire dataset, whereas we might achieve as good an accuracy by running on a small sample of it and testing when appropriate. This could be due to the size of the data, for example.

First let’s define some statistical terms. A population is the entire collection (or universe) of things under consideration. A sample is a portion of the population that we select for analysis. So, for example, we could start with a full dataset, break off a chunk into a sample, and do our training there. Another way to look at it is that some data that we’re given to start with might itself be only a sample of a much broader dataset.

Polling data is an example of sampling, and is typically gathered by asking questions of people for specific demographics. By design, the polling data can be only a subset of the general population of a country, because it would be quite an achievement to ask everyone in a country what their favorite color might be. If we have a country with a population of 100 million and we conduct a poll that has 30 million ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required