Now we have a text file residing in HDFS that can further be processed using the MapReduce job. Behind all the SQL is a MapReduce job. We will use the Pentaho Data Integration SQL-related step to demonstrate this capability. Follow these steps:
hdfs_to_hive_product_price_history.kjbfrom the chapter's code bundle folder. Load the file into Spoon. You should see a job flow similar to the one shown in the following screenshot:
product-price-history.tsv.gzfile from the local folder into HDFS.