Time for action – implementing WordCount using Streaming

Let's flog the dead horse of WordCount one more time and implement it using Streaming by performing the following steps:

  1. Save the following file to wcmapper.rb:
    #/bin/env ruby
    
    while line = gets
        words = line.split("\t")
        words.each{ |word| puts word.strip+"\t1"}}
    end
  2. Make the file executable by executing the following command:
    $ chmod +x wcmapper.rb
    
  3. Save the following file to wcreducer.rb:
    #!/usr/bin/env ruby
    
    current = nil
    count = 0
    
    while line = gets
        word, counter = line.split("\t")
    
        if word == current
            count = count+1
        else
            puts current+"\t"+count.to_s if current
            current = word
            count = 1
        end
    end
    puts current+"\t"+count.to_s
  4. Make the file executable by executing the following command:
    $ chmod +x wcreducer.rb ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.