that we have the data, what to do with it depends on the operation
taking place. An approach that has stood the test of time is to keep
adding operations to the
Dataset class, building
over time a veritable Swiss army knife. Common families of
operations can include:
Applying functions to entire columns in order to format numbers and dates, switch encodings, or build database keys.
Inserting, appending, and deleting whole columns, breaking into several separate datasets whenever a certain field changes, and sorting operations.
Extracting or dropping rows meeting user-defined criteria.
Cross-tabulate, detabulate (see Figure 13.4), and transpose.
save to native Python data (
cPickle), delimited text files, and fixed-width
Some of these operations are best understood diagrammatically. Consider the operation in Figure 13.4, which can’t be performed by SQL.
This operation was a mainstay of the case study that follows. Once the correct operations have been created, it can be reduced to a piece of Python code:
>>> ds1.pp() # presume we have the table above already ('Patient', 'X', 'Y', 'Z') ('Patient 1', 0.55, 0.08, 0.97) ('Patient 2', 0.54, 0.11, 0.07) ('Patient 3', 0.61, 0.08, 0.44) ...