Chapter 4: Sourcing the Data
The first step of creating a new data pipeline is the process of sourcing the raw dataset. While scoping and defining the dataset are crucial parts of the entire data pipeline project, the framework for extracting the information is well established in general project management and the underlying Agile framework. Therefore, in this chapter, we will begin at the point of having the initial requirements defined and understood.
We will focus on the methods for accessing data sources from both internal sources, freely available public sources, and application programming interfaces (APIs) that have security applied.
We will also discuss some methods for validating the data sources you connect to and ensuring that the ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access