December 2018
Intermediate to advanced
318 pages
8h 28m
English
The TF-IDF is used to measure how important a selected word is with respect to the entire document. This word is chosen from a corpus of words.
We need to generate the TF-IDF from the URLs by using the following code:
url_vectorizer = TfidfVectorizer(tokenizer=url_cleanse) x = url_vectorizer.fit_transform(inputurls)x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)
We then perform a logistic regression on the data frame, as follows:
l_regress = LogisticRegression() # Logistic regressionl_regress.fit(x_train, y_train)l_score = l_regress.score(x_test, y_test)print("score: {0:.2f} %".format(100 * l_score))url_vectorizer_save = url_vectorizer
Finally we save the model and the vector in the file so ...
Read now
Unlock full access