Chapter 12. Beyond Spans
Most distributed tracing systems that run in production today represent requests as a tree of spans. This representation is simple to understand and well-suited to a large number of common workloads, but it isn’t a good fit for all of them. In this chapter we’ll look at how the span came to be the first-class citizen of tracing, and then explore its shortcomings for systems like machine learning models, streaming, pub-sub, and distributed dataflow. Devising new abstractions for tracing is an exciting area of active research and development and we’ll try to give you a flavor of what’s coming in the near future.
Why Spans Have Prevailed
In Chapter 10, we described how early tracing systems influenced the design, and even the terminology, of present-day systems. As distributed systems evolved and became more complicated, users had a pressing need to understand request-response slowdowns, especially when requests were interleaved and executed concurrently. This led to a remote procedure call (RPC)-centric approach, tightly integrated with the way that the systems being traced were implemented. These days, distributed systems have more diverse execution and communication patterns, and for many popular systems the “traditional” request-oriented tracing design is not a good fit. Nevertheless, having the RPC—represented by a span—as the core datatype in distributed tracing has served us well over many years. Let’s start by looking at the reasons.