Chapter 24

Profiling-Guided Optimization

Andrey Vladimirov    Colfax International, USA

Abstract

The chapter focuses on a matrix transposition, a small and self-contained workload of great practical value. The optimization process applied to the code relies exclusively on programming in a high-level language plus utilization of the OpenMP framework. The result is a portable code that can run on both CPU (processor) and MIC (coprocessor) architectures, and can be recompiled for future generations of Intel architectures. The focus of the chapter is on the use of Intel® VTune™ Amplifier XE reports to understand where to apply optimization. Through VTune, the performance monitoring functionality of Intel Xeon Phi coprocessors is showcased not only ...

Get High Performance Parallelism Pearls Volume One now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.