Time for action – generating shape summaries in MapReduce
In this section we will write a mapper that takes as input the UFO sighting record we defined earlier. It will output the shape and a count of 1
, and the reducer will take this shape and count records and produce a new structured Avro datafile type containing the final counts for each UFO shape. Perform the following steps:
- Copy the
sightings.avro
file to HDFS.$ hadoopfs -mkdiravroin $ hadoopfs -put sightings.avroavroin/sightings.avro
- Create the following as
AvroMR.java
:import java.io.IOException; import org.apache.avro.Schema; import org.apache.avro.generic.*; import org.apache.avro.Schema.Type; import org.apache.avro.mapred.*; import org.apache.avro.reflect.ReflectData; import org.apache.avro.util.Utf8; ...
Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.