Chapter 9: Implementing Batch ETL Pipeline with Amazon EMR and Apache Spark

In Chapter 2, Exploring the Architecture and Deployment Options, you learned about different EMR use cases such as batch Extract, Transform, and Load (ETL), real-time streaming with EMR and Spark streaming, data preparation for machine learning (ML) models, interactive analytics, and more.

In this chapter, we will dive deep into a use case – Batch ETL with Amazon EMR and Apache Spark, where we will look at the implementation steps that you can follow to replicate the setup in your AWS account.

We will cover the following topics, which will help you understand the use case, its application architecture, and how a transient EMR cluster with Spark can be integrated for ...

Get Simplify Big Data Analytics with Amazon EMR now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.