Data preparation

In Chapter 11, Working with Twitter Data, we explored how to create a bag of words from the Tweets Sentiment140 dataset. In this chapter, we will complement the example using MongoDB. First, we will prepare and transform the dataset from CSV into a JSON format in order to add it into a MongoDB collection.

Tip

We can download the Sentiment140 training and test data at http://help.sentiment140.com/for-students.

We will download and open the test data; the columns represent sentiment, id, date, and via, user, and text. The first five records will look similar to this:

4,1,Mon May 11 03:21:41 UTC 2009,kindle2,yamarama,@mikefish Fair enough. But i have the Kindle2 and I think it's perfect :) 4,2,Mon May 11 03:26:10 UTC 2009, jquery,dcostalis,Jquery ...

Get Practical Data Analysis - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Practical Data Analysis - Second Edition by Hector Cuesta, Dr. Sampath Kumar

Data preparation

Tip

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly