Chapter 8. Monitoring Performance: Challenges and Solutions

Introduction

System design and tuning aren’t the only aspects of multi-tenant distributed systems that require different treatment than traditional single-node systems or data centers of machines working independently. Monitoring (detection and diagnosis of problems) is also fundamentally different for distributed systems, especially multi-tenant systems for which the nature of the workload can change dramatically over time.

Traditional system administration makes use of a variety of tools for understanding the performance of and debugging problems on a single node, such as the following in Linux:

top

Displays a regularly updated page of current information about hardware use, both for the node as a whole and per-process, focusing on CPU and memory usage.

iotop

Similar to top but reports on disk I/O.

iostat

Generates a report on CPU statistics and input/output statistics for devices, partitions, and network file systems.

ss and ip

Reports on network information for a node, such as sockets, connections, routing, and devices.

sar

Regularly collects and reports on a wide variety of system metrics for the node overall.

The /proc file system

A virtual file system that provides a convenient and structured way to access process data stored in the kernel’s internal data structures.

These tools, along with log files from a machine, are generally used after an operator has identified a particular machine as having slow ...

Get Effective Multi-Tenant Distributed Systems now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.