Skip to Content
Deciphering Data Architectures
book

Deciphering Data Architectures

by James Serra
February 2024
Intermediate to advanced
278 pages
7h 29m
English
O'Reilly Media, Inc.
Book available
Content preview from Deciphering Data Architectures

Chapter 5. Data Lake

Big data started appearing in unprecedented volumes in the early 2010s due to an increase in sources that output semistructured and unstructured data, such as sensors, videos, and social media. Semi-structured and unstructured data hold a phenomenal amount of value—think of the insights contained in years’ worth of customer emails! However, relational data warehouses at that time could only handle structured data. They also had trouble handling large amounts of data or data that needed to be ingested often, so they were not an option for storing these types of data. This forced the industry to come up with a solution: data lakes. Data lakes can easily handle semi-structured and unstructured data and manage data that is ingested often.

Years ago, I spoke with analysts from a large retail chain who wanted to ingest data from Twitter to see what customers thought about their stores. They knew customers would hesitate to bring up complaints to store employees but would be quick to put them on Twitter. I helped them to ingest the Twitter data into a data lake and assess the sentiment of the customer comments, categorizing them as positive, neutral, or negative. When they read the negative comments, they found an unusually large number of complaints about dressing rooms—they were too small, too crowded, and not private enough. As an experiment, the company decided to remodel the dressing rooms in one store. A month after the remodel, the analysts found an overwhelming ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Engineering with dbt

Data Engineering with dbt

Roberto Zagni
Fundamentals of Data Engineering

Fundamentals of Data Engineering

Joe Reis, Matt Housley
Fundamentals of Data Engineering

Fundamentals of Data Engineering

Joe Reis, Matt Housley

Publisher Resources

ISBN: 9781098150754Errata Page