William K. Cheung
Scalability and data privacy are two main challenges hindering distributed data analysis from being widely applied in many collaborative projects. In this chapter, we first review a recently proposed scalable and privacy-preserving distributed data analysis approach. The approach computes abstractions of distributed data which are then used for mining global data patterns. Then, we describe a service-oriented realization of the approach for data clustering and explain in detail how the analysis process is deployed in a BPEL platform for execution. In addition, lessons learned in the implementation exercise and future research directions regarding how distributed data analysis platforms can be built with even higher scalability and improved support for privacy preservation is also discussed.
With the advent of the Web and grid computing, distributed data are now much easier to gain access to and distributed computing in a heterogeneous environment is becoming much more feasible. In the past few years, there have been a number of large-scale cross-disciplinary and cross-institution research collaboration projects launched in different application domains including e-science, e-business and e-government, to name just a few. Among them, it is common to see distributed data analysis as an important part of the project. The underlying scalability and ...