Chapter 8. Monitoring Performance: Challenges and Solutions
Introduction
System design and tuning aren’t the only aspects of multi-tenant distributed systems that require different treatment than traditional single-node systems or data centers of machines working independently. Monitoring (detection and diagnosis of problems) is also fundamentally different for distributed systems, especially multi-tenant systems for which the nature of the workload can change dramatically over time.
Traditional system administration makes use of a variety of tools for understanding the performance of and debugging problems on a single node, such as the following in Linux:
-
top
Displays a regularly updated page of current information about hardware use, both for the node as a whole and per-process, focusing on CPU and memory usage.
-
iotop
Similar to
top
but reports on disk I/O.-
iostat
Generates a report on CPU statistics and input/output statistics for devices, partitions, and network file systems.
-
ss
andip
Reports on network information for a node, such as sockets, connections, routing, and devices.
-
sar
Regularly collects and reports on a wide variety of system metrics for the node overall.
- The /proc file system
A virtual file system that provides a convenient and structured way to access process data stored in the kernel’s internal data structures.
These tools, along with log files from a machine, are generally used after an operator has identified a particular machine as having slow ...
Get Effective Multi-Tenant Distributed Systems now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.