Chapter 2Data Collection
As discussed in the first chapter, data collection is the first step in building a big data pipeline. The objective of data collection is to store the data on the AWS platform to extract insights and predict possible events happening in the future. In this chapter, we will discuss the different types of ingestion and the appropriate technologies that are suitable for each type. In summary, there are three types of data sources that we will have to work with to collect the data:
- Existing transactional systems (such as CRM systems or POS systems), which are typically based on databases like Aurora, MySQL, Oracle, Microsoft SQL Server
- Streaming data coming from IoT devices, sensors, social media
- Files coming from web ...
Get AWS Certified Data Analytics Study Guide now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.