July 2017
Beginner to intermediate
715 pages
17h 3m
English
Data cleaning is a critical step in most data science problems. Data that is not properly cleaned may have errors such as misspellings, inconsistent representation of elements such as dates, and extraneous words.
There are numerous data cleaning options that we can apply to Twitter data. For this application, we perform simple cleaning. In addition, we will filter out certain tweets.
The conversion of the text to lowercase letters is easily achieved as shown here:
public TweetHandler toLowerCase() { this.text = this.text.toLowerCase().trim(); return this; }
Part of the process is to remove certain tweets that are not needed. For example, the following code illustrates how to detect whether the tweet ...
Read now
Unlock full access