book

Python Cookbook

by Alex Martelli, David Ascher

July 2002

Intermediate to advanced

608 pages

15h 46m

English

O'Reilly Media, Inc.

Read now

Unlock full access

The Design of the Book

Content preview from Python Cookbook

Processing Selected Pairs of Structured Data Efficiently

Credit: Alex Martelli, David Ascher

Problem

You need to efficiently process pairs of data from two large and related data sets.

Solution

Use an auxiliary dictionary to do preprocessing of the data, thereby reducing the need for iteration over mostly irrelevant data. For instance, if xs and ys are the two data sets, with matching keys as the first item in each entry, so that x[0] == y[0] defines an “interesting” pair:

auxdict = {}
for y in ys: auxdict.setdefault(y[0], []).append(y)
result = [ process(x, y) for x in xs for y in auxdict[x[0]] ]

Discussion

To make the problem more concrete, let’s look at an example. Say you need to analyze data about visitors to a web site who have purchased something online. This means you need to perform some computation based on data from two log files—one from the web server and one from the credit-card processing framework. Each log file is huge, but only a small number of the web server log entries correspond to credit-card log entries. Let’s assume that cclog is a sequence of records, one for each credit-card transaction, and that weblog is a sequence of records describing each web site hit. Let’s further assume that each record uses the attribute ipaddress to refer to the IP address involved in each event. In this case, a reasonable first approach would be to do something like:

results = [ process(webhit, ccinfo) for webhit in weblog for ccinfo in cclog \ if ccinfo.ipaddress==webhit.ipaddress ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Publisher Resources

ISBN: 0596001673Supplemental Content Catalog Page Errata

Python Cookbook

by Alex Martelli, David Ascher

Processing Selected Pairs of Structured Data Efficiently

Problem

Solution

Discussion

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Modern Python Cookbook - Second Edition

Python Cookbook, 3rd Edition

Python Programming Language

Using Asyncio in Python

Publisher Resources