Chapter 9: Serverless ETL Pipelines

In the previous chapter, you learned how to tame unstructured or loosely structured data using Athena to manipulate logs, JavaScript Object Notation (JSON), and other types of machine-generated data. In this chapter, we'll continue with the theme of controlling chaos by using automation to normalize newly arrived data through a process known as extract, transform, load (ETL). We start with a brief explanation of ETL, and once we've established a basic understanding of ETL processes, we will move on to best practices and common pitfalls of using Athena for ETL.

As with most of the chapters in this book, we'll then get hands-on by designing and implementing a serverless ETL pipeline. More precisely, we'll implement ...

Get Serverless Analytics with Amazon Athena now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Serverless Analytics with Amazon Athena by Anthony Virtuoso, Mert Turkay Hocanin, Aaron Wishnick

Chapter 9: Serverless ETL Pipelines

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly