Skip to Content
Introduction to Machine Learning with R
book

Introduction to Machine Learning with R

by Scott V. Burger
March 2018
Beginner to intermediate
223 pages
5h 38m
English
O'Reilly Media, Inc.
Content preview from Introduction to Machine Learning with R

Chapter 3. Sampling Statistics and Model Training in R

Sampling and machine learning go hand in hand. In machine learning, we typically begin with a big dataset that we want to use for predicting something. We usually split this data into a training set and build a model around that, and then unleash a fully trained model on some kind of test set to see what the final output is. In some instances, it might be very difficult to run a machine learning model on an entire dataset, whereas we might achieve as good an accuracy by running on a small sample of it and testing when appropriate. This could be due to the size of the data, for example.

First let’s define some statistical terms. A population is the entire collection (or universe) of things under consideration. A sample is a portion of the population that we select for analysis. So, for example, we could start with a full dataset, break off a chunk into a sample, and do our training there. Another way to look at it is that some data that we’re given to start with might itself be only a sample of a much broader dataset.

Polling data is an example of sampling, and is typically gathered by asking questions of people for specific demographics. By design, the polling data can be only a subset of the general population of a country, because it would be quite an achievement to ask everyone in a country what their favorite color might be. If we have a country with a population of 100 million and we conduct a poll that has 30 million ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Machine Learning with R - Third Edition

Machine Learning with R - Third Edition

Brett Lantz
Practical Machine Learning with R

Practical Machine Learning with R

Brindha Priyadarshini Jeyaraman, Ludvig Renbo Olsen, Monicah Wambugu

Publisher Resources

ISBN: 9781491976432Errata PageSupplemental Content