Skip to Content
Python Cookbook
book

Python Cookbook

by Alex Martelli, David Ascher
July 2002
Intermediate to advanced
608 pages
15h 46m
English
O'Reilly Media, Inc.
Content preview from Python Cookbook

Processing Selected Pairs of Structured Data Efficiently

Credit: Alex Martelli, David Ascher

Problem

You need to efficiently process pairs of data from two large and related data sets.

Solution

Use an auxiliary dictionary to do preprocessing of the data, thereby reducing the need for iteration over mostly irrelevant data. For instance, if xs and ys are the two data sets, with matching keys as the first item in each entry, so that x[0] == y[0] defines an “interesting” pair:

auxdict = {}
for y in ys: auxdict.setdefault(y[0], []).append(y)
result = [ process(x, y) for x in xs for y in auxdict[x[0]] ]

Discussion

To make the problem more concrete, let’s look at an example. Say you need to analyze data about visitors to a web site who have purchased something online. This means you need to perform some computation based on data from two log files—one from the web server and one from the credit-card processing framework. Each log file is huge, but only a small number of the web server log entries correspond to credit-card log entries. Let’s assume that cclog is a sequence of records, one for each credit-card transaction, and that weblog is a sequence of records describing each web site hit. Let’s further assume that each record uses the attribute ipaddress to refer to the IP address involved in each event. In this case, a reasonable first approach would be to do something like:

results = [ process(webhit, ccinfo) for webhit in weblog for ccinfo in cclog \ if ccinfo.ipaddress==webhit.ipaddress ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Modern Python Cookbook - Second Edition

Modern Python Cookbook - Second Edition

Steven F. Lott
Python Cookbook, 3rd Edition

Python Cookbook, 3rd Edition

David Beazley, Brian K. Jones

Publisher Resources

ISBN: 0596001673Supplemental ContentCatalog PageErrata