Chapter 3: Data Ingestion

In the previous chapter, we discussed the fundamental concepts and inner workings of the various features/microservices that are available in AWS Glue, such as Glue Data Catalog, connections, crawlers, and classifiers, the schema registry, Glue ETL jobs, development endpoints, interactive sessions, and triggers. We also explored how AWS Glue crawlers aid in data discovery by crawling different types of data stores – Amazon S3, JDBC (Amazon RDS or on-premises databases), and DynamoDB/MongoDB/DocumentDB infer the schema and populate AWS Glue Data Catalog. While discussing Glue ETL in the previous chapter, we introduced a few of the important extensions/features of Spark ETL, including GlueContext, DynamicFrame, JobBookmark ...

Get Serverless ETL and Analytics with AWS Glue now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.