Loading data into Hadoop
Hadoop is at the heart of the Big Data movement. Being derived from Google's white papers on MapReduce and Google File System, Hadoop is able to scale up beyond petabytes of data and provide the backbone for fast and effective data analysis.
Pentaho was one of the first companies to provide support for Hadoop and has open sourced those capabilities, along with steps for other Big Data sources.
Note
There are a lot of great tutorials and videos on Pentaho's Big Data wiki available at http://wiki.pentaho.com/display/BAD/Pentaho+Big+Data+Community+Home.
Getting ready
Before we actually try to connect to Hadoop, we have to set up an appropriate environment. Companies like Hortonworks and Cloudera have been at the forefront of providing ...
Get Pentaho Data Integration Cookbook Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.