6Monitoring and Controlling Power and Performance of Servers and Data Centers

This chapter will present some techniques to monitor and control the power and performance of IT devices (we will focus on servers), and the data center infrastructure itself with its pumps, chillers and so on.

When changes have to be made on a system, it is important to understand what is the potential impact of the changes on the system’s behavior and therefore to have models or tools to predict their impact. These tools can predict the impact of a frequency change on a server’s performance or energy or the impact of a cooling change on the data center PUE. We will present first the low-level components and application programming interface (API) to measure power and performance of servers equipped with Xeon processors and NVIDIA accelerators, then some modeling techniques to predict the power and performance of servers and finally high-level software to manage and control the power and performance of servers in the data centers.

6.1. Monitoring power and performance of servers

Measuring and monitoring accurately is mandatory step before controlling the behavior. We will discuss first the sensors and related APIs to measure power and temperature and next how to monitor the performance.

6.1.1. Sensors and APIs for power and thermal monitoring on servers

Power and thermal measurement is error prone since the accuracy of measurements can vary a lot depending on the granularity and the accuracy of ...

Get Energy-Efficient Computing and Data Centers now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.