Analyzing supermarket shopping baskets

As an example, we will look at dataset consisting of anonymous transactions at a supermarket in Belgium. This dataset was made available by Tom Brijs at Hasselt University. Because of privacy concerns, the data has been anonymized, so we only have a number for each product, and each basket therefore consists of a set of numbers. The data file is available from several online sources (including this book's companion website).

We begin by loading the dataset and looking at some statistics (this is always a good idea):

from collections import defaultdict from itertools import chain # File is downloaded as a compressed file import gzip # file format is a line per transaction # of the form '12 34 342 5...' ...

Get Building Machine Learning Systems with Python - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.