O'Reilly logo

HDInsight Essentials - Second Edition by Rajesh Nadipalli

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Tools and technology for Hadoop ecosystem

Next generation architecture includes Hadoop-based projects that complement the traditional RDBMS systems. The following table highlights key projects that are organized by Data Lake capabilities:

Capability

Tool

Description

Ingest

Flume

This is a distributed and reliable software to collect large amounts of data from different sources such as logfiles in a streaming fashion in Hadoop.

Ingest

Sqoop

This tool is designed to transfer data between Hadoop and RDBMS such as Oracle, Teradata, and SQL Server.

Organize

HCatalog

This tool stores metadata for Hadoop, including file structures and formats. It provides an abstraction and interoperability across various tools such as Pig, MapReduce, Hive, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required