Skip to Content
Programming Pig
book

Programming Pig

by Alan Gates
October 2011
Intermediate to advanced content levelIntermediate to advanced
220 pages
6h 25m
English
O'Reilly Media, Inc.
Content preview from Programming Pig

Preface

Data Addiction

Data is addictive. Our ability to collect and store data has grown massively in the last several decades. Yet our appetite for ever more data shows no sign of being satiated. Scientists want to be able to store more data in order to build better mathematical models of the world. Marketers want better data to understand their customers’ desires and buying habits. Financial analysts want to better understand the workings of their markets. And everybody wants to keep all their digital photographs, movies, emails, etc.

The computer and Internet revolutions have greatly increased our ability to collect and store data. Before these revolutions, the US Library of Congress was one of the largest collections of data in the world. It is estimated that its printed collections contain approximately 10 terabytes (TB) of information. Today large Internet companies collect that much data on a daily basis. And it is not just Internet applications that are producing data at prodigious rates. For example, the Large Synoptic Survey Telescope (LSST) planned for construction in Chile is expected to produce 20 TB of data every day.

Part of the reason for this massive growth in data is our ability to collect much more data. Every time someone clicks on a website’s links, the web server can record information about what page the user was on and which link he clicked. Every time a car drives over a sensor in the highway, its speed can be recorded. But much of the reason is also our ability ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Programming Pig, 2nd Edition

Programming Pig, 2nd Edition

Alan Gates, Daniel Dai
Pig Design Patterns

Pig Design Patterns

Pradeep Pasupuleti
Apache Hadoop™ YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop™ 2

Apache Hadoop™ YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop™ 2

Arun C. Murthy, Vinod Kumar Vavilapalli, Doug Eadline, Joseph Niemiec, Jeff Markham

Publisher Resources

ISBN: 9781449317881Errata Page