Part II. Tactics: Analytic Patterns

Now that you’ve met the fundamental analytic machinery (in both its MapReduce and table-operation form), it’s time to put it to work.

This part of the book will equip you to think tactically (i.e., in terms of the changes you would like to make to the data). Each chapter introduces a repeatedly useful data transformation pattern, demonstrated in Pig (and, where we’d like to reinforce the record-by-record action, in Python as well).

One of this book’s principles is to center demonstrations on an interesting and realistic problem from some domain. And whenever possible, we endeavor to indicate how the approach would extend to other domains, especially ones with an obvious business focus. The tactical patterns, however, are exactly those tools that crop up in nearly every domain: think of them as the screwdriver, torque wrench, lathe, and so forth of your toolkit. Now, if this book were called Big Mechanics for Chimps, we might introduce those tools by repairing and rebuilding a Volkswagen Beetle engine, or by building another lathe from scratch. Those lessons would carry over to anywhere machine tools apply: air conditioner repair, fixing your kid’s bike, or building a rocketship to Mars.

So we will focus this part of the book on the dataset we just introduced, what Nate Silver calls “the perfect dataset”: the sea of numbers surrounding the sport of baseball. The members of the Retrosheet and Baseball Databank projects have provided ...

Get Big Data for Chimps now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.