Comparing results with a dummy classifier

The scikit-learn DummyClassifier class implements several strategies for random guessing, which can serve as a baseline for classifiers. The strategies are as follows:

  • stratified: This uses the training set class distribution
  • most_frequent: This predicts the most frequent class
  • prior: This is available in scikit-learn 0.17 and predicts by maximizing the class prior
  • uniform: This uses an uniform distribution to randomly sample classes
  • constant: This predicts a user-specified class

As you can see, some strategies of the DummyClassifier class always predict the same class. This can lead to warnings from some scikit-learn metrics functions. We will perform the same analysis as we did in the Computing precision, ...

Get Python Data Analysis Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.