6

FAEHIM: Federated AnalysisEnvironment for HeterogeneousIntelligent Mining

Ali Shaikh Ali and Omer F. Rana

ABSTRACT

Data mining is a process of mapping large volumes of data onto more compact representations, which may be used to support decision making. At the core of the process is the application of specific data mining methods for pattern discovery and extraction. This process is often structured from interactive and iterative stages within a discovery pipeline and workflow. At these different stages of the discovery pipeline, a user needs to access, integrate and analyse data from disparate sources, to use data patterns and models generated through intermediate stages and to feed these models to further stages in the pipeline.

The availability of Web service standards and their adoption by a number of communities, including the grid community, indicates that development of a data mining toolkit based on Web services is likely to be useful to a significant user community. Providing data mining Web services also enables these to be integrated with other third party services, allowing data mining algorithms to be embedded within existing applications.

We present a data mining toolkit, called FAEHIM, that makes use of Web service composition, with the widely deployed Triana workflow environment. Most of the Web services are derived from the Weka data mining library of algorithms.

6.1 Introduction

The capabilities of generating and collecting data have been growing in recent years. ...

Get Data Mining Techniques in Grid Computing Environments now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.