November 2017
Beginner to intermediate
366 pages
7h 59m
English
Clustering can be defined as the task of separating a set of observations/tuples into groups/clusters so that the intra-cluster records are similar and the inter-cluster records are dissimilar. There are several approaches to clustering when we are dealing with data at rest. In streaming data, data continues to arrive at a particular rate. We don't have the luxury of accessing the data randomly or making multiple passes on the data. Among the data stream clustering methods, a large number of algorithms use a two-phase scheme which consists of an online component that processes data stream points and produces summary statistics, and an offline component that uses the summary data to generate the clusters.
Read now
Unlock full access