Chapter 6. Looking Ahead

As the data lake becomes an important part of next-generation data architectures, we see multiple trends emerging based on different vertical use cases that indicate what the future of data lakes will look like.

Ground-to-Cloud Deployment Options

Currently, most data lakes reside on-premises at organizations, but a growing number of enterprises are moving to the cloud because of the agility, ease of use, and economic benefits of a cloud-based platform. As clouds—both private and public—mature from security and multi-tenancy perspectives, we’ll see this trend intensify, and it’s important that data lake tools work across both environments.

As a result, we’re seeing an increased adoption of cloud-based Hadoop infrastructures that complement and sometimes even replace on-premises Hadoop deployments. As data onboarding, management, and governance matures and becomes easier, data needs to be accessible in cloud-based architectures the same way it is available in on-premises architectures. Most data lake vendors are extending their tools so they work seamlessly across cloud and physical environments. This allows business users and data scientists to spin up and down clusters in the cloud, and create augmented platforms for both agile analytics and traditional queries.

With a cloud-to-ground environment, you have a hybrid architecture that may be useful for organizations that have yet to build their own private clouds. It can be used to store sensitive or vulnerable ...

Get Architecting Data Lakes now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.