Time for action – reduce-side join using MultipleInputs

We can perform the report explained in the previous section using a reduce-side join by performing the following steps:

  1. Create the following tab-separated file and name it sales.txt:
    00135.992012-03-15
    00212.492004-07-02
    00413.422005-12-20
    003499.992010-12-20
    00178.952012-04-02
    00221.992006-11-30
    00293.452008-09-10
    0019.992012-05-17
  2. Create the following tab-separated file and name it accounts.txt:
    001John AllenStandard2012-03-15
    002Abigail SmithPremium2004-07-13
    003April StevensStandard2010-12-20
    004Nasser HafezPremium2001-04-23
  3. Copy the datafiles onto HDFS.
    $ hadoop fs -mkdir sales
    $ hadoop fs -put sales.txt sales/sales.txt
    $ hadoop fs -mkdir accounts
    $ hadoop fs -put accounts/accounts.txt
    

Get Hadoop Beginner's Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.