Skip to Content
Practical Data Analysis Cookbook
book

Practical Data Analysis Cookbook

by Tomasz Drabas
April 2016
Beginner to intermediate content levelBeginner to intermediate
384 pages
8h 36m
English
Packt Publishing
Content preview from Practical Data Analysis Cookbook

Splitting the dataset into training, cross-validation, and testing

To build a statistical model that can be trusted, we need to have confidence that it abstracts the phenomenon that we deal with accurately. To gain such trust, we need to test the model to see if it performs well. To assess the accuracy of our model, we cannot use the same dataset that we used for the training.

In this recipe, you will learn how to split your dataset into two subsets quickly: one that is used solely to train the model and the other one is used to test it.

Getting ready

To execute this recipe, you will need pandas, SQLAlchemy, and NumPy. No other prerequisites are required.

How to do it…

We read our data from the PostgreSQL database and store it in the data DataFrame. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Data Analysis Cookbook

Python Data Analysis Cookbook

Ivan Idris
Practical Simulations for Machine Learning

Practical Simulations for Machine Learning

Paris Buttfield-Addison, Mars Buttfield-Addison, Tim Nugent, Jon Manning

Publisher Resources

ISBN: 9781783551668Supplemental Content