Chapter 20. Performance Engineering with Observability
Observability is often framed as a tool for reliability—keeping the lights on and fixing what breaks. However, a mature observability practice also serves as the foundation for performance engineering: the disciplined pursuit of making systems faster, more efficient, and more cost-effective. In this chapter, we explore how to move from reactive firefighting to proactive optimization by leveraging the overlap between these two fields.
The Case for Performance Engineering
Imagine you had a mystery bug that was silently adding latency to every request in your services. That was the situation our team found ourselves in, and the catalyst for our performance engineering journey.
In December 2021, the Honeycomb engineering team was surprised to find a discrepancy between the request latency for our ingest endpoint as measured by the load balancer logs and the autoinstrumentation spans in our application traces. The effect was most pronounced for POST requests to our ingest service endpoint that ingested single wide events rather than larger batches. Somehow, time was going missing, and requests that would take tenths of a millisecond to complete on the server side were showing up in the load balancer as single-digit milliseconds.
Larger requests to our batch endpoint also showed the same discrepancy of single-digit millisecond variation between the data sources, but the difference was less noticeable as a proportion of the request ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access