We can now implement this model in Python using a simple bidimensional dataset, created using the make_blobs() function provided by Scikit-Learn:
from sklearn.datasets import make_blobsnb_samples = 1000nb_unlabeled = 750X, Y = make_blobs(n_samples=nb_samples, n_features=2, centers=2, cluster_std=2.5, random_state=100)unlabeled_idx = np.random.choice(np.arange(0, nb_samples, 1), replace=False, size=nb_unlabeled)Y[unlabeled_idx] = -1
We have created 1,000 samples belonging to 2 classes. 750 points have then been randomly selected to become our unlabeled dataset (the corresponding class has been set to -1). We can now initialize two Gaussian distributions by defining their mean, covariance, and weight. ...