Data preparation
In Chapter 11, Working with Twitter Data, we explored how to create a bag of words from the Tweets Sentiment140 dataset. In this chapter, we will complement the example using MongoDB. First, we will prepare and transform the dataset from CSV into a JSON format in order to add it into a MongoDB collection.
Tip
We can download the Sentiment140 training and test data at http://help.sentiment140.com/for-students.
We will download and open the test data; the columns represent sentiment
, id
, date
, and via, user, and text. The first five records will look similar to this:
4,1,Mon May 11 03:21:41 UTC 2009,kindle2,yamarama,@mikefish Fair enough. But i have the Kindle2 and I think it's perfect :) 4,2,Mon May 11 03:26:10 UTC 2009, jquery,dcostalis,Jquery ...
Get Practical Data Analysis - Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.