System design and tuning aren’t the only aspects of multi-tenant distributed systems that require different treatment than traditional single-node systems or data centers of machines working independently. Monitoring (detection and diagnosis of problems) is also fundamentally different for distributed systems, especially multi-tenant systems for which the nature of the workload can change dramatically over time.
Traditional system administration makes use of a variety of tools for understanding the performance of and debugging problems on a single node, such as the following in Linux:
Displays a regularly updated page of current information about hardware use, both for the node as a whole and per-process, focusing on CPU and memory usage.
top but reports on disk I/O.
Generates a report on CPU statistics and input/output statistics for devices, partitions, and network file systems.
Reports on network information for a node, such as sockets, connections, routing, and devices.
Regularly collects and reports on a wide variety of system metrics for the node overall.
A virtual file system that provides a convenient and structured way to access process data stored in the kernel’s internal data structures.
These tools, along with log files from a machine, are generally used after an operator has identified a particular machine as having slow ...