Geometric Operations

Now that we have the data, what to do with it depends on the operation taking place. An approach that has stood the test of time is to keep adding operations to the Dataset class, building over time a veritable Swiss army knife. Common families of operations can include:

Field transformations

Applying functions to entire columns in order to format numbers and dates, switch encodings, or build database keys.

Row and column operations

Inserting, appending, and deleting whole columns, breaking into several separate datasets whenever a certain field changes, and sorting operations.

Filter operations

Extracting or dropping rows meeting user-defined criteria.

Geometric operations

Cross-tabulate, detabulate (see Figure 13.4), and transpose.

Storage operations

Load and save to native Python data (marshal, cPickle), delimited text files, and fixed-width text files.

Some of these operations are best understood diagrammatically. Consider the operation in Figure 13.4, which can’t be performed by SQL.

Detabulating and adding constant columns
Figure 13.4. Detabulating and adding constant columns

This operation was a mainstay of the case study that follows. Once the correct operations have been created, it can be reduced to a piece of Python code:

>>> ds1.pp() # presume we have the table above already ('Patient', 'X', 'Y', 'Z') ('Patient 1', 0.55, 0.08, 0.97) ('Patient 2', 0.54, 0.11, 0.07) ('Patient 3', 0.61, 0.08, 0.44) ...

Get Python Programming On Win32 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.