Broadcasting and distributing shared resources to tasks in a MapReduce job – Hadoop DistributedCache
We can use the Hadoop DistributedCache to distribute read-only file-based resources to the Map and Reduce tasks. These resources can be simple data files, archives, or JAR files that are needed for the computations performed by the Mappers or the Reducers.
How to do it...
The following steps show you how to add a file to the Hadoop DistributedCache and how to retrieve it from the Map and Reduce tasks:
- Copy the resource to the HDFS. You can also use files that are already there in the HDFS.
$ hadoop fs –copyFromLocal ip2loc.dat ip2loc.dat
- Add the resource to the DistributedCache from your driver program:
Job job = Job.getInstance…… …… job.addCacheFile(new ...
Get Hadoop MapReduce v2 Cookbook - Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.