With many monitoring efforts beginning in the sysadmin/ops engineer team, it’s no wonder that many of us immediately associate “monitoring” with “the thing the sysadmins do.” This is unfortunate since we’ve seen there’s so much more to monitoring than just what happens on a server.
Of course, there’s an element of truth in the misconception: a lot really does happen on the server! Even in a serverless architecture, there are still servers underneath that provide the platform and all that makes it tick. We’re going to delve down into what sort of common services you’ll encounter on servers these days, what metrics and logs are provided, and how to make sense of it all.
One note before we jump in: this chapter is going to use Linux as the assumed operating system, since that’s what I’m most familiar with. For the readers applying these lessons to Windows, nearly all of the stuff we’ll be covering is just as applicable to Windows in a general sense, though your tools are different.
Over the course of this book, I’ve railed against the obsession with the standard OS metrics (CPU, memory, load, network, disk) and for good reason: starting your monitoring work with them is starting with the metrics that offer the least signal of all toward your main concern (that your app is working). In order to know if things are working, you have to start at the top instead, which I covered xref.
However, that isn’t to say these metrics are not without ...