5

Object Stores and Data Lakes

Enterprises have leaned heavily on databases and data warehouses for many decades. Around the turn of the millennium, the internet age was beginning to take hold. The proliferation of connected devices began to present a volume and variety of data that traditional databases and warehouses could no longer keep up with.

While developing a web indexing solution using this large influx of data, Google published a paper in 2003 titled the Google File System (GFS) that would shape industry solutions for the next two decades. This solution allowed for the development of data lakes, which led to lakehouses. Data lakes are a distributed file system that provide a cost-efficient method to store structured, unstructured, ...

Get Data Engineering with Scala and Spark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.