Grid-based data mining withthe Environmental ScenarioSearch Engine (ESSE)

Mikhail Zhizhin, Alexey Poyda, Dmitry Mishin, Dmitry Medvedev, Eric Kihn and Vassily Lyutsarev


The increasing data volumes from today's collection systems and the need of the scientific community to include an integrated and authoritative representation of the natural environment in their analysis requires a new approach to data mining, management and access.

The natural environment includes elements from multiple domains such as space, terrestrial weather, oceans and terrain. Systems such as the Global Change Master Directory (GCMD) from NASA1 or the Master Environmental Library (MEL) from the DMSO2 and others provide the ability to search metadata by keywords, the result being a set of links to archived environmental data sets distributed across the network, but they are unable to search for specific patterns within the data themselves.

The environmental modelling community has begun to develop several archives of continuous environmental representations. These archives contain a complete view of the Earth system parameters on a regular grid for a considerable period of time. The numerical models used to reproduce environmental parameters take all available observational data as initial conditions, so the resulting petabyte-size data sets may be considered as an authoritative high-resolution representation of terrestrial weather during the last 50 years (Kalnay et al., 1996; Uppala et al., 2005). ...

Get Data Mining Techniques in Grid Computing Environments now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.