Solving an easy problem first

As we have seen, when we looked at our tweet data, the tweets are not only positive or negative. The majority of the tweets actually do not contain any sentiments, but are neutral or irrelevant, containing, for instance, raw information (for example, New book: Building Machine Learning ... http://link). This leads to four classes. To not complicate the task too much, let's only focus on the positive and negative tweets for now:

>>> # first create a Boolean list having true for tweets>>> # that are either positive or negative>>> pos_neg_idx = np.logical_or(Y_orig=="positive", Y_orig =="negative")
    
>>> # now use that index to filter the data and the labels>>> X = X_orig [pos_neg_idx]>>> Y = Y_orig [pos_neg_idx] ...

Get Building Machine Learning Systems with Python - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.