November 2019
Intermediate to advanced
346 pages
9h 36m
English
In the next steps, we will convert a corpus of text data into numerical form, amenable to machine learning algorithms:
with open("anonops_short.txt", encoding="utf8") as f: anonops_chat_logs = f.readlines()
from sklearn.feature_extraction.text import HashingVectorizerfrom sklearn.feature_extraction.text import TfidfTransformermy_vector = HashingVectorizer(input="content", ngram_range=(1, 2))X_train_counts = my_vector.fit_transform(anonops_chat_logs,)tf_transformer = TfidfTransformer(use_idf=True,).fit(X_train_counts)X_train_tf = tf_transformer.transform(X_train_counts)