O'Reilly logo

Pentaho for Big Data Analytics by Feris Thia, Manoj R Patil

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Loading data from HDFS into Hive (job orchestration)

Now we have a text file residing in HDFS that can further be processed using the MapReduce job. Behind all the SQL is a MapReduce job. We will use the Pentaho Data Integration SQL-related step to demonstrate this capability. Follow these steps:

  1. Launch Spoon if it is not running.
  2. Open hdfs_to_hive_product_price_history.kjb from the chapter's code bundle folder. Load the file into Spoon. You should see a job flow similar to the one shown in the following screenshot:
    Loading data from HDFS into Hive (job orchestration)
  3. The Hadoop Copy Files step is responsible for copying the product-price-history.tsv.gz file from the local folder into HDFS.
  4. The TABLE ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required