Skip to Content
Practical Data Analysis Cookbook
book

Practical Data Analysis Cookbook

by Tomasz Drabas
April 2016
Beginner to intermediate content levelBeginner to intermediate
384 pages
8h 36m
English
Packt Publishing
Content preview from Practical Data Analysis Cookbook

Sampling the data

Sometimes the dataset that we have is too big to be used to build a model. For practical reasons (so that the estimation of our models does not take forever), it is good to create a stratified sample from the full dataset.

In this recipe, we will read from our MongoDB database and use Python to create a sample.

Getting ready

To execute this recipe, you will need PyMongo, pandas, and NumPy. No other prerequisites are required.

How to do it…

There are two approaches that one can take: either specify the fraction of the original dataset (say, 20%) or specify the number of records one would like to retrieve from the dataset. The following code shows you how to fetch a fraction of the dataset (the data_sampling.py file):

strata_frac = 0.2 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Data Analysis Cookbook

Python Data Analysis Cookbook

Ivan Idris
Practical Simulations for Machine Learning

Practical Simulations for Machine Learning

Paris Buttfield-Addison, Mars Buttfield-Addison, Tim Nugent, Jon Manning

Publisher Resources

ISBN: 9781783551668Supplemental Content