Chapter 3. Challenges and Complications

A data lake is not a panacea. It has its challenges, and organizations wishing to deploy a data lake must address those challenges head-on. As this book has discussed, data lakes are built as vehicles for storing and providing access to large volumes of disparate data. Rather than creating rigid and limited EDWs, all your data can be stored together for discovery, enabling greater leveraging of valuable data for business purposes. This solves two problems that have plagued traditional approaches to Big Data: it eliminates data silos, and it enables organizations to make use of new types of data (i.e., streaming and unstructured data), which are difficult to place in a traditional EDW.

However, challenges still exist in building, managing, and getting value out of the data lake. We’ll examine these challenges in turn.

Challenges of Building a Data Lake

When building a data lake, you run into three potential roadblocks: the rate of change in the technology ecosystem, scarcity of skilled personnel, and the technical complexity of Hadoop.

Rate of Change

The Hadoop ecosystem is large, complex, and constantly changing. Keeping up with the developments in the open-source community can be a full-time job in and of itself. Each of the components is continually evolving, and new tools and solutions are constantly emerging from the community. For an overview of the Hadoop ecosystem, check out The Hadoop Ecosystem Table on GitHub.

Acquiring Skilled ...

Get Architecting Data Lakes now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.