9

Building an End-to-End Data-Wrangling Pipeline with AWS SDK for Pandas

In the previous chapters, we learned about the data-wrangling process and how to utilize different services for data-wrangling activities within the AWS ecosystem:

  • We explored AWS Glue DataBrew, which helps you in creating a data-wrangling pipeline through a GUI-based approach for every type of user.
  • We also went through SageMaker Data Wrangler, which also helps users in creating a GUI-based data-wrangling pipeline, but it’s more closely aligned with machine learning workloads with tighter integration with the SageMaker service.
  • We also explored AWS SDK for Pandas, aka awswrangler, which is a hands-on coding approach to data wrangling that integrates the Pandas library ...

Get Data Wrangling on AWS now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.