As we have seen, when we looked at our tweet data, the tweets are not only positive or negative. The majority of the tweets actually do not contain any sentiments, but are neutral or irrelevant, containing, for instance, raw information (for example, New book: Building Machine Learning ... http://link). This leads to four classes. To not complicate the task too much, let's only focus on the positive and negative tweets for now:
>>> # first create a Boolean list having true for tweets>>> # that are either positive or negative>>> pos_neg_idx = np.logical_or(Y_orig=="positive", Y_orig =="negative") >>> # now use that index to filter the data and the labels>>> X = X_orig [pos_neg_idx]>>> Y = Y_orig [pos_neg_idx] ...