O'Reilly logo

Practical Data Analysis by Hector Cuesta

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data preparation

In Chapter 11, Sentiment Analysis of Twitter Data, we explored how to create a bag of words from the Tweets Sentiment140 dataset. In this chapter, we will complement the example by using MongoDB. First we will prepare and transform the dataset from CSV to a JSON format in order to add it into a MongoDB collection.

Tip

We can download the Sentiment140 training and test data from http://help.sentiment140.com/for-students.

We will download and open the test data, the columns represent sentiment, id, date, via, user, and text. The first five records will look like this:

4,1,Mon May 11 03:21:41 UTC 2009,kindle2,yamarama,@mikefish  Fair enough. But i have the Kindle2 and I think it's perfect  :)
4,2,Mon May 11 03:26:10 UTC 2009, jquery,dcostalis,Jquery ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required