November 2017
Intermediate to advanced
374 pages
10h 19m
English
The batches here are key. Batches are iterated through to find the batch mean; for the next iteration, the prior batch mean is updated in relation to the current iteration. There are several options that dictate the general k-means behavior and parameters that determine how MiniBatch k-means gets updated.
The batch_size parameter determines how large the batches should be. Just for fun, let's run MiniBatch; however, this time we set the batch size to be the same as the dataset size:
minibatch = MiniBatchKMeans(batch_size=len(blobs))%time minibatch.fit(blobs)Wall time: 1min MiniBatchKMeans(batch_size=1000000, compute_labels=True, init='k-means++', init_size=None, max_iter=100, max_no_improvement=10, n_clusters=8, n_init=3, ...
Read now
Unlock full access