2Histogram-Based Clustering of Sensor Network Data

In this chapter, we assume that a sensor network is used for monitoring, over time, a physical phenomenon. Each sensor performs repeated measurements at a very high frequency so that it is not possible to store the whole amount of data into some easy to access media. We propose a clustering strategy that processes online the incoming observations in order to find groups of sensors that behave similarly over time. The proposed strategy is made by two phases: the online phase aims at summarizing the incoming data; the offline phase provides the partitioning of the streams into clusters. In the online phase, the incoming observations are split into batches. Our proposal consists of summarizing each subsequence in the batch by a histogram. Finally, a fast clustering algorithm is performed on these summaries in order to get a local partitioning of the subsequences. The offline step finds a consensus partition starting from the achieved local partitions. Through an application on real data, we show the effectiveness of our strategy in finding homogeneous groups of data streams.

2.1. Introduction

Massive data sets, having the form of continuous streams with no fixed length, are becoming very common due to the availability of sensor networks that can perform, at a very high frequency, repeated measurements of some variable. We can think, for instance, of real-time data recorded by surveillance systems, of electricity consumption recording ...

Get Data Analysis and Applications 1 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.