Skip to Content
The Enterprise Big Data Lake
book

The Enterprise Big Data Lake

by Alex Gorelik
March 2019
Beginner to intermediate
221 pages
6h 35m
English
O'Reilly Media, Inc.
Book available
Content preview from The Enterprise Big Data Lake

Chapter 9. Governing Data Access

This chapter describes the challenges of providing analysts access to the data in a data lake and presents several best practices for doing so. Data lakes differ from more traditional data storage in several ways:

Load

The numbers of data sets, users, and changes are extremely high.

Frictionless ingestion

Because a data lake stores data for future, yet-to-be-determined analytics, it usually ingests the data with minimal, if any, processing.

Encryption

There are often government or internal regulations that require sensitive or personal information to be protected, yet that data is needed for analysis.

Exploratory nature of work

A lot of data science work cannot be anticipated by IT staff. Data scientists often do not know what’s available in the huge and diverse data store. This creates a catch-22 situation for traditional approaches: if analysts cannot find data that they don’t have access to, they can’t ask for access to it.

The easiest access model is to provide all analysts access to all data. Unfortunately, this cannot be done if the data is subject to government regulations (as is the case, for example, with personally identifiable information or credit card information), is copyrighted with restricted access (e.g., if it has been purchased or obtained from external sources for very specific or limited use), or is considered critical and sensitive by the company for competitive or other reasons. Most companies have data they consider ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Lake for Enterprises

Data Lake for Enterprises

Vivek Mishra, Tomcy John, Pankaj Misra
Operationalizing the Data Lake

Operationalizing the Data Lake

Holden Ackerman, Jon King

Publisher Resources

ISBN: 9781491931547Errata Page