Many organizations use Hadoop-driven data lakes as an adjunct staging area for their enterprise data warehouses (EDW). But for those companies ready to take the plunge, a data lake is far more useful as a one-stop-shop for extracting insights from their vast collection of data. With this eBook, you’ll learn best practices for building, maintaining, and deriving value from a Hadoop data lake in production environments.
Authors Alice LaPlante and Ben Sharma explain how a data lake will enable your organization to manage an increasing volume of datasets—from blog postings and product reviews to streaming data—and to discover important relationships between them. Whether you want to control administrative costs in healthcare or reduce risk in financial services, this ebook addresses the architectural considerations and required capabilities you need to build your own data lake.
With this report, you’ll learn:
- The key attributes of a data lake, including its ability to store information in native formats for later processing
- Why implementing data management and governance in your data lake is crucial
- How to address various challenges for building and managing a data lake
- Self-service options that enable different users to access the data lake without help from IT
- Emerging trends that will shape the future of data lakes
Table of contents
- What Is a Data Lake?
- Data Management and Governance in the Data Lake
- How to Deploy a Data Lake Management Platform
- 2. How Data Lakes Work
- 3. Challenges and Complications
- 4. Curating the Data Lake
- 5. Deriving Value from the Data Lake
- 6. Looking Ahead
- Title: Architecting Data Lakes
- Release date: April 2016
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491952597
You might also like
Data Architecture: A Primer for the Data Scientist, 2nd Edition
Over the past 5 years, the concept of big data has matured, data science has grown …
Architecting Data Lakes, 2nd Edition
Many organizations today are succeeding with data lakes, not just as storage repositories but as places …
Architecting Modern Data Platforms
There’s a lot of information about big data technologies, but splicing these technologies into an end-to-end …
Data Science from Scratch, 2nd Edition
To really learn data science, you should not only master the tools—data science libraries, frameworks, modules, …