Chapter 15: Architecting Data Lakes for Real-World Scenarios and Edge Cases

We are now well versed in the concept of a data lake, a centralized repository that allows you to store all your structured and unstructured data at any scale. Since a data lake primarily focuses on storage, it does not require as much processing power as other methods (such as the data warehouse), making it easier, faster, and more cost-effective to scale up as data volumes grow.

The data lake is not just a repository – it requires a well-designed data architecture, along with proper planning and management. As it is driven by a data-based design, it helps you rapidly ingest raw data before any business requirements come into the picture. There are a variety of tools ...

Get Serverless ETL and Analytics with AWS Glue now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.