Data pre-processing usingOGSA-DAI
Martin Swain and Neil P. Chue Hong
Data pre-processing and data management are challenging and essential features of gridenabled data mining applications. Powerful data management grid middleware is available, but it has yet to be fully exploited by grid-based data mining systems. Here, the Open Grid Services Architecture – Data Access and Integration (OGSA-DAI) software is explored as a uniform framework for providing data services to support the data mining process. It is shown how the OGSA-DAI activity framework already provides powerful functionality to support data mining, and that this can be readily extended to provide new operations for specific data mining applications. This functionality is demonstrated by two application scenarios, which use complex workflows to access, integrate and preprocess distributed data sets. Finally, OGSA-DAI is compared with other available data handling solutions, and future issues in the field are discussed.
Data management grid middleware has evolved to address the issues of distributed, heterogeneous data collections held across dynamic virtual organizations (Finkelstein, Gryce and Lewis-Bowen, 2004; Laure, Stockinger and Stockinger, 2005). Many of the principles and technologies developed can also be used to assist in the manipulation of data for the purposes of data mining. The Open Grid Services Architecture – Data Access and Integration (OGSA-DAI) software (Antonioletti ...