Chapter 5. Data Ingestion

You can have data without information, but you cannot have information without data.

—Daniel Keys Moran

Extraction, Loading, and Transformation (ELT)

A significant percentage of the effort put into Hadoop has to do with loading and unloading data from the cluster. A number of ELT tools and third-party tools come with the Hadoop distribution (see Figure 5.1). However, the focus here is on the Hadoop tools. They include the following:

Image Flume (streaming)

Image Sqoop (SQL data sources)

WebHDFS (REST APIs)

HDFS NFS

Figure 5.1 ...

Get Virtualizing Hadoop: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.