O'Reilly logo

Effective Amazon Machine Learning by Alexis Perrier

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Assessing our predictions

Since we know the real class of our held-out samples, we can calculate the ROC-AUC score and other metrics to see how close our prediction and validation scores are. Assuming that our data subsets have very similar distributions, both scores should end up very close. The difference only comes from randomness in the samples for the validation and held-out sets.

The following Python script uses the scikit-learn library (http://scikit-learn.org/) as well as the pandas library. It takes a few lines of Python to calculate the AUC score of the model on that prediction dataset. First, download the gzipped file from S3 and then, in a Python Notebook or console, run the following:

import pandas as pd  from sklearn import ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required