April 2016
Beginner to intermediate
384 pages
8h 36m
English
To build a statistical model that can be trusted, we need to have confidence that it abstracts the phenomenon that we deal with accurately. To gain such trust, we need to test the model to see if it performs well. To assess the accuracy of our model, we cannot use the same dataset that we used for the training.
In this recipe, you will learn how to split your dataset into two subsets quickly: one that is used solely to train the model and the other one is used to test it.
To execute this recipe, you will need pandas, SQLAlchemy, and NumPy. No other prerequisites are required.
We read our data from the PostgreSQL database and store it in the data DataFrame. ...