Chapter 8. Monitoring Performance: Challenges and Solutions


System design and tuning aren’t the only aspects of multi-tenant distributed systems that require different treatment than traditional single-node systems or data centers of machines working independently. Monitoring (detection and diagnosis of problems) is also fundamentally different for distributed systems, especially multi-tenant systems for which the nature of the workload can change dramatically over time.

Traditional system administration makes use of a variety of tools for understanding the performance of and debugging problems on a single node, such as the following in Linux:


Displays a regularly updated page of current information about hardware use, both for the node as a whole and per-process, focusing on CPU and memory usage.


Similar to top but reports on disk I/O.


Generates a report on CPU statistics and input/output statistics for devices, partitions, and network file systems.

ss and ip

Reports on network information for a node, such as sockets, connections, routing, and devices.


Regularly collects and reports on a wide variety of system metrics for the node overall.

The /proc file system

A virtual file system that provides a convenient and structured way to access process data stored in the kernel’s internal data structures.

These tools, along with log files from a machine, are generally used after an operator has identified a particular machine as having slow ...

Get Effective Multi-Tenant Distributed Systems now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.