O'Reilly logo

Virtualizing Hadoop: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture by George J. Trujillo Jr., Justin Murray, Rommel Garcia, Steven Jones, Charles Kim

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 5. Data Ingestion

You can have data without information, but you cannot have information without data.

—Daniel Keys Moran

Extraction, Loading, and Transformation (ELT)

A significant percentage of the effort put into Hadoop has to do with loading and unloading data from the cluster. A number of ELT tools and third-party tools come with the Hadoop distribution (see Figure 5.1). However, the focus here is on the Hadoop tools. They include the following:

Image Flume (streaming)

Image Sqoop (SQL data sources)

WebHDFS (REST APIs)

HDFS NFS

Figure 5.1 ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required