Summary

The chapter started with an overview of data at motion and data at rest, also called as the streaming data. We further dwelled into the properties of streaming data and the challenges it poses while processing it. We introduced the stream clustering algorithm. The famous offline/online approach to stream clustering was discussed. Later on, we introduced various classes in stream package and how to use them. During that process, we discussed ideas about several data generators, DBSTREAM algorithms to find micro and macro clusters and several metrics to assess the quality of clusters. We then introduced our use case. We went ahead to design a clustering algorithm, with the online part based on reservoir sampling and the offline part ...

Get R Data Analysis Projects now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.