Chapter 6. Overhead, Costs, and Sampling
Defining the right set of spans to trace to understand your application can be a challenge—though a challenge worth rising to—but once you’ve done so, you’ll find yourself faced with another challenge: managing the torrent of spans as they’re emitted from your application. Even when your application is generating data at the right volume, it’s still important to understand the impact on the performance of your application and the cost of your computing infrastructure. The first tenet of distributed tracing—like all observability tools—should be to “first, do no harm.” Tracing can be implemented in a way that has negligible impact on your application, but managing the cost of the infrastructure can be more difficult.
Not all spans have equal value. Many spans represent run-of-the-mill requests that are (hopefully) bountiful within your application. While it’s useful to measure the performance of these requests and perhaps to have a few examples, chances are that just a handful will be sufficient. On the other hand, spans related to a rarely occurring bug or to a small but important user can provide critical insight into what’s happening and why.
Above all, it’s important that the set of spans representing a single request are preserved as an atomic unit. If only a part of a request is available, then tracing has failed in its goal of providing end-to-end information about what’s happening. This means than while many spans might appear to ...