Resiliency

I want to finish this chapter with a brief discussion of the importance of designing a tracing backend that is resilient to potential, often unintentional, abuse. I am not talking about an under-provisioned cluster, as there is little that can be done there. While operating Jaeger at Uber, we have experienced a number of tracing service degradations or even outages due to a few common mistakes.

Over-sampling

During development, I often recommend engineers to configure the Jaeger tracer with 100% sampling. Sometimes, inadvertently, the same configuration is pushed to production, and if the service is one of those serving high traffic, the tracing backend gets flooded with tracing data. It does not necessarily kill the backend because, ...

Get Mastering Distributed Tracing now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.