Tuning the classifier's parameters

Certainly, we have not explored the current setup enough and should investigate more. There are roughly two areas where we can play with the knobs: TfidfVectorizer and MultinomialNB. As we have no real intuition in which area we should explore, let's try to sweep the hyperparameters.

We will see the TfidfVectorizer parameter first:

  • Using different settings for ngrams:
    • unigrams (1,1)
    • unigrams and bigrams (1,2)
    • unigrams, bigrams, and trigrams (1,3)
  • Playing with min_df: 1 or 2
  • Exploring the impact of IDF within TF-IDF using use_idf and smooth_idf: False or True
  • Whether to remove stop words or not, by setting stop_words to english or None
  • Whether to use the logarithm of the word counts (sublinear_tf)
  • Whether ...

Get Building Machine Learning Systems with Python - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.