In Chapter 11, Sentiment Analysis of Twitter Data, we explored how to create a bag of words from the
Tweets Sentiment140 dataset. In this chapter, we will complement the example by using MongoDB. First we will prepare and transform the dataset from CSV to a JSON format in order to add it into a MongoDB collection.
We can download the Sentiment140 training and test data from http://help.sentiment140.com/for-students.
We will download and open the test data, the columns represent sentiment, id, date, via, user, and text. The first five records will look like this:
4,1,Mon May 11 03:21:41 UTC 2009,kindle2,yamarama,@mikefish Fair enough. But i have the Kindle2 and I think it's perfect :) 4,2,Mon May 11 03:26:10 UTC 2009, jquery,dcostalis,Jquery ...