Hadoop can be downloaded and installed from https://hadoop.apache.org/. We'll be using the Hadoop streaming API to execute our Python MapReduce program in Hadoop. The Hadoop Streaming API helps in using any program that has a standard input and output as a MapReduce program.
We'll be writing three MapReduce programs using Python, they are as follows:
- A basic word count
- Getting the sentiment Score of each review
- Getting the overall sentiment score from all the reviews
The basic word count
We'll start with the word count MapReduce. Save the following code in a
import sys for l in sys.stdin: # Trailing and Leading white space is removed l = l.strip() # words in the line is split word_tokens = l.split() # Key Value ...