Chapter 3. The Building Blocks of a Data Analysis System

Data projects can be complicated, but they needn’t be. There are infinite outcomes possible with data. Knowing the different stages of data projects will enable you to fragment the complexity and make projects more manageable. The aim of this chapter is for you to be able to play an active role in any data project and help guide it through to delivery.

This chapter will work through the common stages of a data project:

  • Sourcing data through extraction from your systems or acquisition from third parties

  • Storing data at all project stages and for the long term

  • Curating and enriching data

  • Exploring and analyzing data sets

  • Sharing the data products created by your project

The overarching theme of this chapter will be how important it is to identify what problem you are trying to solve with a data project. Focusing on the problem is key, as you are likely to come across roadblocks and will need different options as you create the solutions needed. Allowing yourself the opportunity to pivot as you learn from the data and analysis you are working with is key. This is better than staying steadfast in what you originally set out in a requirements document a long time ago.

Data Extraction and Acquisition

It may go without saying, but you can’t have a data project without having or thinking about getting hold of data. There are three main ways to get data: extraction, acquisition, and creation. Here, we will focus ...

Get Data Curious now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.