Book description
The data lake is a daring new approach for harnessing the power of big data technology and providing convenient self-service capabilities. But is it right for your company? This book is based on discussions with practitioners and executives from more than a hundred organizations, ranging from data-driven companies such as Google, LinkedIn, and Facebook, to governments and traditional corporate enterprises. You’ll learn what a data lake is, why enterprises need one, and how to build one successfully with the best practices in this book.
Alex Gorelik, CTO and founder of Waterline Data, explains why old systems and processes can no longer support data needs in the enterprise. Then, in a collection of essays about data lake implementation, you’ll examine data lake initiatives, analytic projects, experiences, and best practices from data experts working in various industries.
- Get a succinct introduction to data warehousing, big data, and data science
- Learn various paths enterprises take to build a data lake
- Explore how to build a self-service model and best practices for providing analysts access to the data
- Use different methods for architecting your data lake
- Discover ways to implement a data lake from experts in different industries
Publisher resources
Table of contents
- Preface
- 1. Introduction to Data Lakes
- 2. Historical Perspective
- 3. Introduction to Big Data and Data Science
- 4. Starting a Data Lake
- 5. From Data Ponds/Big Data Warehouses to Data Lakes
- 6. Optimizing for Self-Service
- 7. Architecting the Data Lake
- 8. Cataloging the Data Lake
- 9. Governing Data Access
- 10. Industry-Specific Perspectives
- Index
Product information
- Title: The Enterprise Big Data Lake
- Author(s):
- Release date: March 2019
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491931554
You might also like
book
Practical Statistics for Data Scientists, 2nd Edition
Statistical methods are a key part of data science, yet few data scientists have formal statistical …
book
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition
Through a series of recent breakthroughs, deep learning has boosted the entire field of machine learning. …
book
Architecting Data Lakes, 2nd Edition
Many organizations today are succeeding with data lakes, not just as storage repositories but as places …
book
The Self-Service Data Roadmap
Data-driven insights are a key competitive advantage for any industry today, but deriving insights from raw …