March 2016
Beginner to intermediate
290 pages
5h 46m
English
Twitter is one of the most important data sources that helps you to know the sentiments behind various things. In this recipe, we will take a look at how to perform sentiment analysis using Hive on Twitter data.
To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Hive installed on it. Here, I am using Hive 1.2.1.
First of all, we need a dataset to perform this recipe. We will be using a dataset that can be found at http://s3.amazonaws.com/hw-sandbox/tutorial13/SentimentFiles.zip.
Next, we will unzip this data and upload it on HDFS. The zip contains three folders: the first for raw Twitter data, the second for a dictionary, and the third ...
Read now
Unlock full access