O'Reilly logo

Hadoop in Practice, Second Edition by Alex Holmes

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 5. Moving data into and out of Hadoop

This chapter covers

  • Understanding key design considerations for data ingress and egress tools
  • Low-level methods for moving data into and out of Hadoop
  • Techniques for moving log files and relational and NoSQL data, as well as data in Kafka, in and out of HDFS

Data movement is one of those things that you aren’t likely to think too much about until you’re fully committed to using Hadoop on a project, at which point it becomes this big scary unknown that has to be tackled. How do you get your log data sitting across thousands of hosts into Hadoop? What’s the most efficient way to get your data out of your relational and No/NewSQL systems and into Hadoop? How do you get Lucene indexes generated in ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required