Time for action – implementing WordCount using Streaming
Let's flog the dead horse of WordCount one more time and implement it using Streaming by performing the following steps:
- Save the following file to
wcmapper.rb
:#/bin/env ruby while line = gets words = line.split("\t") words.each{ |word| puts word.strip+"\t1"}} end
- Make the file executable by executing the following command:
$ chmod +x wcmapper.rb
- Save the following file to
wcreducer.rb
:#!/usr/bin/env ruby current = nil count = 0 while line = gets word, counter = line.split("\t") if word == current count = count+1 else puts current+"\t"+count.to_s if current current = word count = 1 end end puts current+"\t"+count.to_s
- Make the file executable by executing the following command:
$ chmod +x wcreducer.rb ...
Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.