Chapter 5. Operations in the World of Containers
“You were not in control. You had no visibility: maybe there was a car in front of you, maybe not.”
Alain Prost, Formula One Driver
With continuous delivery, the build pipeline provides confidence that new or modified application features will work correctly, both at the functional and nonfunctional level. However, this is only part of the story for continuously delivering valuable software to users—as soon as an application is released to production it must be monitored for signs of malfunction, performance issues, and security vulnerabilities.
Starting at the host level, it is essential that any machine running production containers (and applications) provide hardware- and OS-level metrics. All cloud vendors provide an API for obtaining metrics such as CPU usage, and disk and network I/O performance, and such an API is relatively easy to create and expose when running on bare metal (for example, using sar/sysstat). All Linux-based OSes provide excellent metrics, such as number of processes running, run queue averages, and swap space usage.
Regardless of where metrics originate, they should be collected and processed centrally, such as by using a tool like Prometheus, the InfluxData TICK stack, or a SaaS-based offering like Datadog. Centralization not only provides a single place for developers and operators to manually monitor metrics, but also enables core alerts and automated warnings to be defined. ...