O'Reilly logo

Pentaho for Big Data Analytics by Feris Thia, Manoj R Patil

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Putting a data file into HDFS

The previous example shows how PDI interacts with Hive using a SQL-like expression.

Now let's work with the framework filesystem, HDFS. We will copy a CSV text file into an HDFS folder. Follow these steps:

  1. Download a compressed CSV sample file from http://goo.gl/EdJwk5.
  2. Create a new job from Spoon.
  3. Put data in the workspace and create a flow between the following steps:
    • From the General grouping, choose START
    • From the Big Data grouping, choose Hadoop Copy Files
    Putting a data file into HDFS
  4. Double-click on Hadoop Copy Files. The step's editor dialog will appear.
  5. Click on the Browse button next to the File/Folder textbox. The Open File dialog appears; choose ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required