Chapter 3. Getting Started Building Your Data Lake

By now you’re probably thinking, “How does the cloud fit in? Why do organizations decide to build their data lakes in the cloud? How could this work in my environment?” As it turns out, there isn’t a one-size-fits-all reason for building a data lake in the cloud. But many of the requirements for getting value out of a data lake can be satisfied only in the cloud.

In this chapter, we answer the following questions:

  • As your company’s data initiatives mature, at what point should you think about moving to a data lake?

  • Why should you move your data into the cloud? What are the benefits?

  • What are the security concerns with moving data into the cloud?

  • How can you ensure proper governance of your data in the cloud?

The Benefits of Moving a Data Lake to the Cloud

The Enterprise Strategy Group asked companies what the most important attributes were when working with big data and analytics. Not surprisingly, virtually all of the attributes listed were those found when building big data lakes in the cloud, as depicted in Figure 3-1.

Most important attributes when building a data lake
Figure 3-1. Most important attributes when building a data lake

With the cloud come the following key benefits:

Built-in security

When it comes to security, cloud providers have collected knowledge and best practices from all of their customers and have learned from the trials and errors of literally ...

Get Operationalizing the Data Lake now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.