Monitoring systems become critical as you scale. Effective monitoring can drastically ease the maintenance of services.

Having spoken to multiple experts in this field, this is the advice I have collected on the subject:

  • Choose your key statistics carefully. Users don't care if your machine is low on CPU but they do care if your API is slow.
  • Use aggregators; think about services, not machines. If you have more than a handful of machines, you should treat them as an amorphous blob.
  • Avoid the Wall of Graphs. They are slow and it's information overload for a human. Each dashboard should have five graphs with no more than five lines per graphs.
  • Quantiles aren't aggregable, they're hard to get meaningful information from. However, averages are ...

Get Mastering PHP Design Patterns now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.