Click-through prediction with decision tree

After several examples, it is now time to predict ad click-through with the decision tree algorithm we just thoroughly learned and practiced. We will use the dataset from a Kaggle machine learning competition Click-Through Rate Prediction (https://www.kaggle.com/c/avazu-ctr-prediction).

For now, we only take the first 100,000 samples from the train file (unzipped from the train.gz file from https://www.kaggle.com/c/avazu-ctr-prediction/data) for training the decision tree and the first 100,000 samples from the test file (unzipped from the test.gz file from the same page) for prediction purposes.

The data fields are described as follows:

  • id: ad identifier, such as 1000009418151094273, 10000169349117863715 ...

Get Python Machine Learning By Example now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.