O'Reilly logo

OpenGL Insights by Christophe Riccio, Patrick Cozzi

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Performance State Tracking
Aleksandar Dimitrijevi
´
c
37.1 Introduction
Reducing power consumption and dissipation is one of the predominant goals of
all modern integrated circuit designs. Besides the design-time optimizations, all
CPU/GPU vendors implement various real-time methods to reduce po wer consump-
tion while preserving acceptable performance. One of the consequences of power
management is a dynamic change in working frequencies and, hence, the overall per-
formance capabilities of the system. Modern GPU s, for both desktop and mobile
platforms, can be very a ggressive in changing working frequencies according to the
current load.
Consider a simple case of rendering a triangle on a system with an NVIDIA
GeForce GTX 470 graphics card. NVIDIA drivers raise the frequencies to the high-
est level instantly if they detect a 3D application. Even a creation of the OpenGL
rendering context is enough to make the GPU enter the highest performance state.
The moment the application starts, the GPU frequency is 607.5 MHz, while the
memory IO bus frequency is 1674 MHz. A frame rendering time is less than 0.16
ms for the full HD MSAA 8x screen and the GPU utilization is about 0%. After
a dozen seconds, since the utilization is extremely low, the GPU enters a lower per-
formance state. The frame rendering time is changed to about 0.24 ms. Since the
GPU remains at low utilization, the performance is further reduced. After changing
four performance levels, the GPU finally enters the lowest performance state with the
GPU frequency at 50.5 MHz, and memory IO bus frequency at 101 MHz. The ren-
dering capabilities are reduced by an order of magnitude, while the frame rendering
time rises up to 1.87 ms. If we do not track the per formance state, we are not able to
527
37
528 VI Debugging and Profiling
interpret measured results correctly. Furthermore, for less demanding applications,
it is possible to get shorter execution time on some older and less powerful graphics
cards because their lower per formance states may involve much higher frequencies.
37.2 Power Consumption Policies
For many years, graphics card vendors have been developing a highly advanced form
of dynamic power management (DPM). DPM estimates the relative workload and
aggressively conserves power when the workload is low. Power consumption is con-
trolled by changing voltage levels, GPU frequencies, and memory-clock frequencies.
A set of values that define the current power consumption and performance capabil-
ities of the graphics card is known as a performance state (P-state).
NVIDIA defines sixteen P-states, where P0 is the highest P-state, and P 15 is the
idle state. Not all P-states are present on a given system. The state P0 is activated
whenever a 3D application is detected. If the utilization is below some threshold for
a certain period of time, the P-state is changed to a lower level.
AMD defines three P-states, where P0 is the lowest, and P2 is the highest per-
formance state. P0 is the starting state, and it is changed only b y demanding ap-
plications. The latest AMD technology, known as PowerTune [AMD 10], defines a
whole range of working frequencies in the highest P-state. When the GPU reaches
the thermal design power (TDP) limits, the GPU frequency is gradually decreased
while maintaining the high power state. This enables much better performance for
demanding applications, while preserving acceptable power level.
Having in mind such advanced po wer-management scenarios, a fair comparison
of different rendering algorithms cannot b e done on a frame-rate basis only. If the
same or an even lower frame-rate is achieved in the lower P-state, it certainly qualifies
the algorithm as more efficient, or at least less demanding. That is why P-state
tracking is an important part of profiling software. So far, OpenGL doesnt have a
capability to track P-states; thus, we will take a look at how it can be implemented
using vendor-specific APIs: NVAPI for NVIDIA and ADL for AMD hardware.
37.3 P-State Tracking Using NVAPI
NVAPI is NVIDIAs core API that allows direct access to NVIDIA drivers on all
Microsoft Windows platforms [NVIDIA 11c], and it is shipped as a DLL with
the drivers.
1
NVAPI has to be statically linked to an application; hence, a soft-
ware development kit (SDK) has been released with appropriate static library and
1
The official documentation states that NVAPI is supported by drivers since Release 81.20 (R81.20),
but there are problems in accessing most of its functionality through the SDK in pre-R195 drivers. The
first NVAPI SDK was released with R195 in October 2009. Since R256 drivers, all settings have become
wide open to change with NVAPI.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required