Analysing sentiment

After 4 days of intense processing, we extracted around 10 million tweets; representing approximately 30 GB worth of JSON data.

Massaging Twitter data

One of the key reasons Twitter became so popular is that any message has to fit into a maximum of 140 characters. The drawback is also that every message has to fit into a maximum of 140 characters! Hence, the result is massive increase in the use of abbreviations, acronyms, slang words, emoticons, and hashtags. In this case, the main emotion may no longer come from the text itself, but rather from the emoticons used (http://dl.acm.org/citation.cfm?id=1628969), though some studies showed that the emoticons may sometimes lead to inadequate predictions in sentiment (https://arxiv.org/pdf/1511.02556.pdf ...

Get Mastering Spark for Data Science now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.