Chapter 5. Moving data into and out of Hadoop

This chapter covers

  • Understanding key design considerations for data ingress and egress tools
  • Low-level methods for moving data into and out of Hadoop
  • Techniques for moving log files and relational and NoSQL data, as well as data in Kafka, in and out of HDFS

Data movement is one of those things that you aren’t likely to think too much about until you’re fully committed to using Hadoop on a project, at which point it becomes this big scary unknown that has to be tackled. How do you get your log data sitting across thousands of hosts into Hadoop? What’s the most efficient way to get your data out of your relational and No/NewSQL systems and into Hadoop? How do you get Lucene indexes generated in ...

Get Hadoop in Practice, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.