22

–––––––––––––––––––––––

Performance Optimization of Scientific Applications Using an Autonomic Computing Approach

Ioana Banicescu, Florina M. Ciorba, and Srishti Srivastava

22.1   INTRODUCTION

In most scientific applications, the presence of parallel loops is the main source of parallelism. To obtain high performance and to take advantage of parallelism in such applications, which are in general large, computationally intensive, and data parallel, these parallel loops are executed on multiple processors. Simplistically allocating an equal number of loop iterations to the constituent processors almost always delivers unsatisfactory application performance. Performance degradation is mostly due to factors such as interprocessor communication overhead, unequal processor capabilities, processor load differences, and processor synchronization, among others. The overhead related to the differences in processor loads, or the load imbalance, is in many cases the dominant factor causing the processors to finish executing their loop iterations at widely different times, with some processors remaining idle, while others remain heavily loaded. Load imbalance is caused by the interactive effects of irregularities in problem, algorithmic, and systemic characteristics ( [1], Chapter 4). Problem irregularities are mainly brought about by a nonuniform distribution of application data among processors, while algorithmic irregularities are often due to different conditional execution paths ...

Get Scalable Computing and Communications: Theory and Practice now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.