Chapter 3. Data Collection

Errors using inadequate data are much less than those using no data at all.

Charles Babbage

It’s difficult to imagine the power that you’re going to have when so many different sorts of data are available.

Tim Berners-Lee

In the previous chapter, I covered data quality and collecting data right. In this chapter, we switch focus to choosing the right data sources to consume and provision to the analysts. That is, collecting the right data. I’ll cover such topics as prioritizing which data sources to consume, how to collect the data, and how to assess the value that the data provides to the organization.

Collect All the Things

Imagine that you are rolling out a new checkout process on your website. You will want to know exactly how it is performing against your metrics—you will want to track conversion, basket size, and so on—but it will also be instructive and insightful to understand how it is being used. For instance, on some sites, “add to cart” is a painless single click, so a pattern of customer behavior might be to add a bunch of items to the cart as a holding area and then prune that down to their final choices before clicking the checkout submit button. On other sites, however, “add to cart” might involve multiple clicks, and removing items might be harder or ambiguous—in short, there is more friction—so that customers essentially need to make their final decision before adding items to the cart. You can see why instrumenting the checkout ...

Get Creating a Data-Driven Organization now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.