O'Reilly logo

HDInsight Essentials - Second Edition by Rajesh Nadipalli

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Journey to your Data Lake dream

Hadoop's HDFS and YARN are the core components for the next generation Data Lake; there are several other components that need to be built to realize the vision. In this section, we will see the core capabilities that need to be built in order to enable an Enterprise Data Lake. The following are the key components that need to be built for an effective Data Lake:

Journey to your Data Lake dream

Let us look into each component in detail.

Ingestion and organization

Data Lake based on HDFS has a scalable and distributed filesystem that requires a scalable ingestion framework and software that can take in structured, unstructured, and streaming data.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required