Increasing Performance through Optimization on APU

Matthew Doerksen, Parimala Thulasiraman, and Ruppa Thulasiram


As we move into the exascale era of computing, heterogeneous architectures have become an integral component of high-performance systems (HPSs) and high-performance computing (HPC). Over time, we have transitioned from homogeneous central processing unit (CPU)-centric HPSs such as Jaguar [1] to heterogeneous HPSs such as Roadrunner [2], which uses a modified Cell processor and the graphics processing unit (GPU)-based Tianhe-1A [3]. The use of these HPSs has been vital for research applications but, until recently, has not been a factor in the consumer-level experience. However, with new technologies such as AMD’s accelerated processing unit (APU) architecture, which fuses the CPU and the GPU onto a single chip, consumers now have an affordable HPS at their disposal.


To begin, we will provide a basic overview of the different types of heterogeneous architectures currently available. A short list includes the Cell Broadband Engine (Cell BE) [4], GPUs from AMD [5] and NVIDIA, and lastly, AMD’s Fusion APU [6]. Each of these architectures has its own advantages and disadvantages that, in part, determine how well it will perform in a particular situation or algorithm.


FIGURE 27.1.    

Get Scalable Computing and Communications: Theory and Practice now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.