Time for action – reduce-side join using MultipleInputs
We can perform the report explained in the previous section using a reduce-side join by performing the following steps:
- Create the following tab-separated file and name it
sales.txt
:00135.992012-03-15 00212.492004-07-02 00413.422005-12-20 003499.992010-12-20 00178.952012-04-02 00221.992006-11-30 00293.452008-09-10 0019.992012-05-17
- Create the following tab-separated file and name it
accounts.txt
:001John AllenStandard2012-03-15 002Abigail SmithPremium2004-07-13 003April StevensStandard2010-12-20 004Nasser HafezPremium2001-04-23
- Copy the datafiles onto HDFS.
$ hadoop fs -mkdir sales $ hadoop fs -put sales.txt sales/sales.txt $ hadoop fs -mkdir accounts $ hadoop fs -put accounts/accounts.txt
Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.