CHAPTER 2The Fundamentals of Contemporary DataA Primer on What It Is, Why It Matters, and How to Get It

We are drowning in information and starving for knowledge.

—Rutherford D. Rogers

When you refer to a car, most people know what you mean. Sure, depending on where you live, you may call it an automobile, voiture, or coche. That’s not to say, though, that all cars are created equal. They are not. Souped-up Porsches and Lamborghinis run hundreds of thousands of dollars, while used Pintos cost mere hundreds. Some cars run only on diesel fuel, and an increasing number require no fuel at all—and then there are the hybrids. More distinctions are on their way. Soon some cars will drive themselves. Others already do—sometimes. Tesla currently sports a semiautonomous model, the Model S.


But does the same hold true for data? Does everyone immediately know what you mean when you use the term?

For those looking to really apply analytics, the umbrella term data represents an ultimately unfulfilling starting point. The type of data at your disposal governs much of what you can do with it and how you need to do it. Against that backdrop, when we talk about data, we’re really referring to four different kinds:

  1. Structured
  2. Semistructured
  3. Unstructured
  4. Metadata

Let’s briefly explore each.


When laypersons and even many professionals think of data, they usually picture Microsoft Excel. Perhaps they conjure up lists of sales, leads, employees, paychecks, and transactions. ...

Get Analytics now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.