Summary

The chapter started with an overview of data at motion and data at rest, also called as the streaming data. We further dwelled into the properties of streaming data and the challenges it poses while processing it. We introduced the stream clustering algorithm. The famous offline/online approach to stream clustering was discussed. Later on, we introduced various classes in stream package and how to use them. During that process, we discussed ideas about several data generators, DBSTREAM algorithms to find micro and macro clusters and several metrics to assess the quality of clusters. We then introduced our use case. We went ahead to design a clustering algorithm, with the online part based on reservoir sampling and the offline part ...

Get R Data Analysis Projects now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.