Time for action – generating shape summaries in MapReduce

In this section we will write a mapper that takes as input the UFO sighting record we defined earlier. It will output the shape and a count of 1, and the reducer will take this shape and count records and produce a new structured Avro datafile type containing the final counts for each UFO shape. Perform the following steps:

  1. Copy the sightings.avro file to HDFS.
    $ hadoopfs -mkdiravroin
    $ hadoopfs -put sightings.avroavroin/sightings.avro
    
  2. Create the following as AvroMR.java:
    import java.io.IOException; import org.apache.avro.Schema; import org.apache.avro.generic.*; import org.apache.avro.Schema.Type; import org.apache.avro.mapred.*; import org.apache.avro.reflect.ReflectData; import org.apache.avro.util.Utf8; ...

Get Hadoop: Data Processing and Modelling now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.