Resiliency

I want to finish this chapter with a brief discussion of the importance of designing a tracing backend that is resilient to potential, often unintentional, abuse. I am not talking about an under-provisioned cluster, as there is little that can be done there. While operating Jaeger at Uber, we have experienced a number of tracing service degradations or even outages due to a few common mistakes.

Over-sampling

During development, I often recommend engineers to configure the Jaeger tracer with 100% sampling. Sometimes, inadvertently, the same configuration is pushed to production, and if the service is one of those serving high traffic, the tracing backend gets flooded with tracing data. It does not necessarily kill the backend because, ...

Get Mastering Distributed Tracing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.