5A Spatial Dependence Measure and Prediction of Georeferenced Data Streams Summarized by Histograms

5.1. Introduction

Massive datasets having the form of continuous streams with no fixed length are becoming very common due to the availability of sensor networks that can perform, at a very high frequency, repeated measurements of some variables.

The knowledge extraction from such data must consider the technological characteristics of the tools for data acquisition as well as the nature of the monitored phenomenon.

Often, data acquisition is performed by sensors with limited storage and processing resources. Moreover, the communication among sensors is constrained by their physical distribution or by limited bandwidths. Finally, the recorded data relate, often, to highly evolving phenomena for which it is necessary to use algorithms that adapt the knowledge with the arrival of new observations.

The prevailing paradigm for the analysis of data in this context is centralized data stream analysis. Observations, recorded by sensors, are organized and processed by a single unit that provides the results of queries. In this case, the single processing unit should ensure space and time efficiency so that the data have to be processed on the fly, at the speed in which it is recorded, and algorithms need to adapt their behavior over time, consistently with the dynamic nature of data.

In the framework of distributed stream processing, this chapter deals with the monitoring of data stream ...

Get Advances in Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.