5

Working with Amazon S3

In previous chapters, we repeatedly discussed the concepts of big data and data lakes and how organizations are using them to store and extract valuable insights from their data through various data wrangling processes, as outlined in Chapter 1, using Amazon Web Services (AWS) services such as AWS Glue DataBrew, the AWS SDK for Pandas, and SageMaker Data Wrangler. This chapter will delve deeper into the specifics of big data and data lakes.

Specifically, we will be covering the following topics:

  • The definition and concept of big data
  • The characteristics of big data
  • The concept and definition of a data lake
  • Best practices for building a data lake on Amazon Simple Storage Service (Amazon S3)
  • The layout and organization ...

Get Data Wrangling on AWS now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.