Chapter 12. Conclusion

Although there are numerous ways to build a data lake, we believe that adopting a cloud-native platform capable of handling complex, varying workloads as well as delivering deep analytics and machine learning for your big data is the way to go. Even if you are not using all of the technologies mentioned, we hope that you were able to see why we argue this is the case.

In this book, we’ve provided a brief history of big data tools as they evolved, first in the open source world and then later as commercial or Software-as-a-Service distributions, and the subsequent development of the public cloud market. We’ve discussed how the trends have converged to give today’s enterprises powerful choices on how to extract value from their structured and unstructured data.

We also showed you why you need a data lake to most effectively take advantage of your data. We made the case for a data-driven culture and showed you how to get there. Then, we walked you through how to build a data lake and stressed the benefits of building your data lake in the cloud. After you’re in the cloud, you’ll need tools to manage your growing data lake, so we provided a roundup of those. Security is just as important in the cloud as it is for on-premises setups, and we explained how to do it right. Then, we went deeper into the roles and responsibilities of three key members of the data team—data scientists, data engineers, and data analysts—to help you establish your own team with a structure ...

Get Operationalizing the Data Lake now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.