When your task is to predict more than a single label (for instance: What's the weather like today? Which flower is this? What's your job?), we call the problem a multilabel classification. Multilabel classification is a very popular task, and many performance metrics exist to evaluate classifiers. Of course, you can use all of these measures in the case of a binary classification. Now, let's explain how it works by using a simple, real-world example:
In: from sklearn import datasets iris = datasets.load_iris() # No crossvalidation for this dummy notebook from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(iris.data, iris.target, test_size=0.50, random_state=4) ...