Time for action – using the Distributed Cache to improve location output

Let's now use the Distributed Cache to share a list of U.S. state names and abbreviations across the cluster:

  1. Create a datafile called states.txt on the local filesystem. It should have the state abbreviation and full name tab separated, one per line. Or retrieve the file from this book's homepage. The file should start like the following:
    AL      Alabama
    AK      Alaska
    AZ      Arizona
    AR      Arkansas
    CA      California
    
    …
  2. Place the file on HDFS:
    $ hadoop fs -put states.txt states.txt
    
  3. Copy the previous UFOLocation.java file to UFOLocation2.java file and make the changes by adding the following import statements:
    import java.io.* ; import java.net.* ; import java.util.* ; import org.apache.hadoop.fs.Path; ...

Get Hadoop Beginner's Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.