Chapter 5. Data Ingestion
You can have data without information, but you cannot have information without data.
—Daniel Keys Moran
Extraction, Loading, and Transformation (ELT)
A significant percentage of the effort put into Hadoop has to do with loading and unloading data from the cluster. A number of ELT tools and third-party tools come with the Hadoop distribution (see Figure 5.1). However, the focus here is on the Hadoop tools. They include the following:
Flume (streaming)
Sqoop (SQL data sources)
WebHDFS (REST APIs)
HDFS NFS
Get Virtualizing Hadoop: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.