July 2018
Beginner to intermediate
406 pages
9h 55m
English
Nevertheless, using these linguistic features in isolation without the words themselves will not take us very far. Therefore, we have to combine the TfidfVectorizer parameter with the linguistic features. This can be done with scikit-learn's FeatureUnion class. It is initialized in the same manner as Pipeline; however, instead of evaluating the estimators in a sequence, each passing the output of the previous one to the next one, FeatureUnion does it in parallel and joins the output vectors afterward:
def create_union_model(params=None): def preprocessor(tweet): tweet = tweet.lower() for k in emo_repl_order: tweet = tweet.replace(k, emo_repl[k]) for r, repl in re_repl.items(): tweet = re.sub(r, repl, tweet) return ...
Read now
Unlock full access