Skip to Main Content
Hadoop Application Architectures
book

Hadoop Application Architectures

by Mark Grover, Ted Malaska, Jonathan Seidman, Gwen Shapira
July 2015
Intermediate to advanced content levelIntermediate to advanced
250 pages
10h 47m
English
O'Reilly Media, Inc.
Content preview from Hadoop Application Architectures

Chapter 2. Data Movement

Now that we’ve discussed considerations around storing and modeling data in Hadoop, we’ll move to the equally important subject of moving data between external systems and Hadoop. This includes ingesting data into Hadoop from systems such as relational databases or logs, and extracting data from Hadoop for ingestion into external systems. We’ll spend a good part of this chapter talking about considerations and best practices around data ingestion into Hadoop, and then dive more deeply into specific tools for data ingestion, such as Flume and Sqoop. We’ll then discuss considerations and recommendations for extracting data from Hadoop.

Data Ingestion Considerations

Just as important as decisions around how to store data in Hadoop, which we discussed in Chapter 1, are the architectural decisions on getting that data into your Hadoop cluster. Although Hadoop provides a filesystem client that makes it easy to copy files in and out of Hadoop, most applications implemented on Hadoop involve ingestion of disparate data types from multiple sources and with differing requirements for frequency of ingestion. Common data sources for Hadoop include:

  • Traditional data management systems such as relational databases and mainframes

  • Logs, machine-generated data, and other forms of event data

  • Files being imported from existing enterprise data storage systems

There are a number of factors to take into consideration when you’re importing data into Hadoop from these ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Practical Hadoop Migration: How to Integrate Your RDBMS with the Hadoop Ecosystem and Re-Architect Relational Applications to NoSQL

Practical Hadoop Migration: How to Integrate Your RDBMS with the Hadoop Ecosystem and Re-Architect Relational Applications to NoSQL

Bhushan Lakhe
Modern Big Data Processing with Hadoop

Modern Big Data Processing with Hadoop

Manoj R Patil, V Naresh Kumar, Prashant Shindgikar
Architecting HBase Applications

Architecting HBase Applications

Jean-Marc Spaggiari, Kevin O'Dell

Publisher Resources

ISBN: 9781491910313Errata Page