4 Getting data into the platform
This chapter covers
- Understanding databases, files, APIs, and streams
- Ingesting data from RDBMSs using SQL versus change data capture
- Parsing and ingesting data from various file formats
- Developing strategies to deal with source schema changes
- Designing an ingestion pipeline to handle the challenges of data streams
- Building an ingestion pipeline for SaaS data
- Implementing quality control and monitoring in your ingestion pipeline
- Discussing network and security considerations for cloud data ingestion
If you’ve read the chapters up to this point, you’re able to architect a good, layered data lake. Now it’s time to start diving into a few of these layers in much greater detail.
In this chapter, we’ll focus on ...