O'Reilly logo

Hadoop MapReduce v2 Cookbook - Second Edition by Thilina Gunarathne

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Importing data to HDFS from a relational database using Apache Sqoop

Apache Sqoop is a project that enables efficient bulk transfer of data between Apache Hadoop ecosystem and relational data stores. Sqoop can be used to automate the process of importing data from or exporting data to RDBMSs such as MySQL, PostgreSQL, Oracle, and so on. Sqoop also supports database appliances such as Netezza and Teradata, as well. It supports parallel import/export of data using multiple Map tasks and also supports throttling to reduce the load on the external RDBMSs.

In this recipe, we'll be using Sqoop2 to import data from a PostgreSQL database in to HDFS. We also include instructions for Sqoop 1.4.x as well, due to the wide availability and usage of that Sqoop ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required