Geometric Operations
Now
that we have the data, what to do with it depends on the operation
taking place. An approach that has stood the test of time is to keep
adding operations to the Dataset
class, building
over time a veritable Swiss army knife. Common families of
operations can include:
- Field transformations
Applying functions to entire columns in order to format numbers and dates, switch encodings, or build database keys.
- Row and column operations
Inserting, appending, and deleting whole columns, breaking into several separate datasets whenever a certain field changes, and sorting operations.
- Filter operations
Extracting or dropping rows meeting user-defined criteria.
- Geometric operations
Cross-tabulate, detabulate (see Figure 13.4), and transpose.
- Storage operations
Load and save to native Python data (
marshal
,cPickle
), delimited text files, and fixed-width text files.
Some of these operations are best understood diagrammatically. Consider the operation in Figure 13.4, which can’t be performed by SQL.
This operation was a mainstay of the case study that follows. Once the correct operations have been created, it can be reduced to a piece of Python code:
>>> ds1.pp() # presume we have the table above already ('Patient', 'X', 'Y', 'Z') ('Patient 1', 0.55, 0.08, 0.97) ('Patient 2', 0.54, 0.11, 0.07) ('Patient 3', 0.61, 0.08, 0.44) ...
Get Python Programming On Win32 now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.