CHAPTER 7
Performance Tuning and Optimization
The UPC language has been designed specifically to allow programmers to get the best performance from a wide range of parallel computer architectures. Many features of the UPC language and toolset will aid in this task. But to obtain the best performance, it is essential to have in mind some basic knowledge about the characteristics, system architecture, and performance of parallel computers. The chapter begins with a brief primer on parallel machines and general performance issues in parallel programming. For a deeper insight into many of these issues, there are many texts on the subjects referred to in this chapter. After this general introductory overview of parallel system architecture, three critical factors in achieving performance with UPC are discussed:
- The UPC compiler, which analyzes the UPC application program and applies a variety of techniques to it to produce good executable code
- The UPC run-time system, which both enables running programs and observes their dynamic behavior in order to undertake actions to improve performance at run time
- Hand optimizations that are performed by programmers to enhance application performance
With this arsenal of language constructs and software tools, the programmer has the means to craft parallel programs and make them perform well. The UPC programming model exhibits significant control flexibility to permit some important optimizations. Some specific techniques for achieving these ...
Get UPC: DISTRIBUTED SHARED MEMORY PROGRAMMING now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.