Time for action – summarizing the shape data
Just as we provided a summarization for the overall UFO data set earlier, let's now do a more focused summarization on the data provided for UFO shapes:
- Save the following to
shapemapper.rb
:#!/usr/bin/env ruby while line = gets parts = line.split("\t") if parts.size == 6 shape = parts[3].strip puts shape+"\t1" if !shape.empty? end end
- Make the file executable:
$ chmod +x shapemapper.rb
- Execute the job once again using the WordCount reducer:
$ hadoop jar hadoop/contrib/streaming/hadoop-streaming-1.0.3.jarr --file shapemapper.rb -mapper shapemapper.rb -file wcreducer.rb -reducer wcreducer.rb -input ufo.tsv -output shapes
- Retrieve the shape info:
$ hadoop fs -cat shapes/part-00000
What just happened? ...
Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.