Chapter 8. Improving Baseline Performance

In any other production system—whether it’s a software system or a factory—the process by which the product is created has a profound impact on the cost of production and on the product itself. In modern software applications, production costs are mostly related to computing resources and other infrastructure, including the costs of buying and running servers in a private datacenter or renting them from a cloud provider. How that software is delivered also affects user experience. In this chapter, we consider how to reduce costs and improve user experience using distributed tracing.

In particular, we focus on improving baseline performance: that is, how the software performs over the course of weeks, months, or quarters. Understanding baseline performance will enable you to plan engineering work over the next few weeks or months effectively, maximizing your chances of having a positive impact. (In contrast, the following chapter will focus on approaches to restoring performance to that baseline when something has gone wrong.)

In the previous chapter, we discussed distributed tracing in the context of the “three pillars of observability.” In particular, we said that software developers and operators have the most to gain from distributed tracing and other observability tools when those tools take advantage of all three forms of performance telemetry: metrics, logs, and traces. As such, the approaches in this chapter will consider distributed ...

Get Distributed Tracing in Practice now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.