Chapter 2. Moving data in and out of Hadoop

 

This chapter covers
  • Understanding key design considerations for data ingress and egress tools
  • Techniques for moving log files into HDFS and Hive
  • Using relational databases and HBase as data sources and data sinks

 

Moving data in and out of Hadoop, which I’ll refer to in this chapter as data ingress and egress, is the process by which data is transported from an external system into an internal system, and vice versa. Hadoop supports ingress and egress at a low level in HDFS and MapReduce. Files can be moved in and out of HDFS, and data can be pulled from external data sources and pushed to external data sinks using MapReduce. Figure 2.1 shows some of Hadoop’s ingress and egress mechanisms. ...

Get Hadoop in Practice now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.