- In order to represent semi-supervised or censored data, we'll need to do a little data pre-processing. First, we'll walk through a simple example, and then we'll move on to some more difficult cases:
from sklearn import datasetsd = datasets.load_iris()
- Due to the fact that we'll be messing with the data, let's make copies and add an unlabeled member to the target name's copy. It'll make it easier to identify the data later:
X = d.data.copy()y = d.target.copy()names = d.target_names.copy()names = np.append(names, ['unlabeled'])namesarray(['setosa', 'versicolor', 'virginica', 'unlabeled'], dtype='|S10')
- Now, let's update y with -1. This is the marker for the unlabeled case. This is also why we added unlabeled at the end ...