O'Reilly logo

CUDA Fortran for Scientists and Engineers by Massimiliano Fatica, Gregory Ruetsch

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 2

Performance Measurement and Metrics

Abstract

A prerequisite to performance optimization is a means to accurately time portions of a code and subsequently describes how to use such timing information to assess code performance. In this chapter we first discuss how to time kernel execution using CPU timers, CUDA events, and the Command Line Profiler as well as the nvprof profiling tool. We then discuss how timing information can be used to determine the limiting factor of kernel execution. Finally, we discuss how to calculate performance metrics, especially related to bandwidth, and how such metrics should be interpreted.

Keywords

Timing; Performance metrics; CUDA events; Profiling; Bandwidth; Arithmetic throughput; Synchronization

A prerequisite ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required