O'Reilly logo

Practical Predictive Analytics by Ralph Winters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Comparing a sample to the population

To illustrate some of the benefits of sampling, and to see how you can often get close to the same results with a sample as with a larger population, copy the following code and run it within an R script. This script will generate a 15,000,000 row population and then extract a 100-row random sample. Then we will compare the results:

large.df <- data.frame( gender = c(rep(c("Male", "Female", "Female"), each = 5000000)), purchases = c(0:9, 0:5, 0:7) ) #take a small sample y <- large.df[sample(nrow(large.df), 100), ] mean(large.df$purchases) mean(y$purchases) #Render 2 plots side-by-side by setting the plot frame to 1 by 2 par(mfrow=c(1,2)) barplot(table(y$gender)/sum(table(y$gender)))  )barplot(table(large.df$gender)/sum(table(large.df$gender)) ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required