Scalable and privacy preservingdistributed data analysis overa service-oriented platform

William K. Cheung


Scalability and data privacy are two main challenges hindering distributed data analysis from being widely applied in many collaborative projects. In this chapter, we first review a recently proposed scalable and privacy-preserving distributed data analysis approach. The approach computes abstractions of distributed data which are then used for mining global data patterns. Then, we describe a service-oriented realization of the approach for data clustering and explain in detail how the analysis process is deployed in a BPEL platform for execution. In addition, lessons learned in the implementation exercise and future research directions regarding how distributed data analysis platforms can be built with even higher scalability and improved support for privacy preservation is also discussed.

7.1 Introduction

With the advent of the Web and grid computing, distributed data are now much easier to gain access to and distributed computing in a heterogeneous environment is becoming much more feasible. In the past few years, there have been a number of large-scale cross-disciplinary and cross-institution research collaboration projects launched in different application domains including e-science, e-business and e-government, to name just a few. Among them, it is common to see distributed data analysis as an important part of the project. The underlying scalability and ...

Get Data Mining Techniques in Grid Computing Environments now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.