Mastering Distributed Tracing

Book description

Understand how to apply distributed tracing to microservices-based architectures

Key Features

  • A thorough conceptual introduction to distributed tracing
  • An exploration of the most important open standards in the space
  • A how-to guide for code instrumentation and operating a tracing infrastructure

Book Description

Mastering Distributed Tracing will equip you to operate and enhance your own tracing infrastructure. Through practical exercises and code examples, you will learn how end-to-end tracing can be used as a powerful application performance management and comprehension tool.

The rise of Internet-scale companies, like Google and Amazon, ushered in a new era of distributed systems operating on thousands of nodes across multiple data centers. Microservices increased that complexity, often exponentially. It is harder to debug these systems, track down failures, detect bottlenecks, or even simply understand what is going on. Distributed tracing focuses on solving these problems for complex distributed systems. Today, tracing standards have developed and we have much faster systems, making instrumentation less intrusive and data more valuable.

Yuri Shkuro, the creator of Jaeger, a popular open-source distributed tracing system, delivers end-to-end coverage of the field in Mastering Distributed Tracing. Review the history and theoretical foundations of tracing; solve the data gathering problem through code instrumentation, with open standards like OpenTracing, W3C Trace Context, and OpenCensus; and discuss the benefits and applications of a distributed tracing infrastructure for understanding, and profiling, complex systems.

What you will learn

  • How to get started with using a distributed tracing system
  • How to get the most value out of end-to-end tracing
  • Learn about open standards in the space
  • Learn about code instrumentation and operating a tracing infrastructure
  • Learn where distributed tracing fits into microservices as a core function

Who this book is for

Any developer interested in testing large systems will find this book very revealing and in places, surprising. Every microservice architect and developer should have an insight into distributed tracing, and the book will help them on their way. System administrators with some development skills will also benefit. No particular programming language skills are required, although an ability to read Java, while non-essential, will help with the core chapters.

Table of contents

  1. Mastering Distributed Tracing
    1. Table of Contents
    2. Mastering Distributed Tracing
      1. Why subscribe?
      2. Packt.com
    3. Contributors
      1. About the author
      2. About the reviewer
      3. About the illustrator
      4. Packt is searching for authors like you
    4. Preface
      1. Who this book is for
      2. What this book covers
      3. To get the most out of this book
        1. Download the example code files
        2. Download the color images
        3. Conventions used
      4. Get in touch
        1. Reviews
    5. I. Introduction
      1. 1. Why Distributed Tracing?
        1. Microservices and cloud-native applications
        2. What is observability?
        3. The observability challenge of microservices
        4. Traditional monitoring tools
          1. Metrics
          2. Logs
        5. Distributed tracing
        6. My experience with tracing
        7. Why this book?
        8. Summary
        9. References
      2. 2. Take Tracing for a HotROD Ride
        1. Prerequisites
          1. Running from prepackaged binaries
          2. Running from Docker images
          3. Running from the source code
            1. Go language development environment
            2. Jaeger source code
        2. Start Jaeger
        3. Meet the HotROD
        4. The architecture
        5. The data flow
        6. Contextualized logs
        7. Span tags versus logs
        8. Identifying sources of latency
        9. Resource usage attribution
        10. Summary
        11. References
      3. 3. Distributed Tracing Fundamentals
        1. The idea
        2. Request correlation
          1. Black-box inference
          2. Schema-based
          3. Metadata propagation
        3. Anatomy of distributed tracing
        4. Sampling
        5. Preserving causality
          1. Inter-request causality
        6. Trace models
          1. Event model
          2. Span model
        7. Clock skew adjustment
        8. Trace analysis
        9. Summary
        10. References
    6. II. Data Gathering Problem
      1. 4. Instrumentation Basics with OpenTracing
        1. Prerequisites
          1. Project source code
          2. Go development environment
          3. Java development environment
          4. Python development environment
          5. MySQL database
          6. Query tools (curl or wget)
          7. Tracing backend (Jaeger)
        2. OpenTracing
        3. Exercise 1 – the Hello application
          1. Hello application in Go
          2. Hello application in Java
          3. Hello application in Python
          4. Exercise summary
        4. Exercise 2 – the first trace
          1. Step 1 – create a tracer instance
            1. Create a tracer in Go
            2. Create a tracer in Java
            3. Create a tracer in Python
          2. Step 2 – start a span
            1. Start a span in Go
            2. Start a span in Java
            3. Start a span in Python
          3. Step 3 – annotate the span
            1. Annotate the span in Go
            2. Annotate the span in Java
            3. Annotate the span in Python
          4. Exercise summary
        5. Exercise 3 – tracing functions and passing context
          1. Step 1 – trace individual functions
            1. Trace individual functions in Go
            2. Trace individual functions in Java
            3. Trace individual functions in Python
          2. Step 2 – combine multiple spans into a single trace
            1. Combine multiple spans into a single trace in Go
            2. Combine multiple spans into a single trace in Java
            3. Combine multiple spans into a single trace in Python
          3. Step 3 – propagate the in-process context
            1. In-process context propagation in Python
            2. In-process context propagation in Java
            3. In-process context propagation in Go
          4. Exercise summary
        6. Exercise 4 – tracing RPC requests
          1. Step 1 – break up the monolith
            1. Microservices in Go
            2. Microservices in Java
            3. Microservices in Python
          2. Step 2 – pass the context between processes
            1. Passing context between processes in Go
            2. Passing context between processes in Java
            3. Passing context between processes in Python
          3. Step 3 – apply OpenTracing-recommended tags
            1. Standard tags in Go
            2. Standard tags in Java
            3. Standard tags in Python
          4. Exercise summary
        7. Exercise 5 – using baggage
          1. Using baggage in Go
          2. Using baggage in Java
          3. Using baggage in Python
          4. Exercise summary
        8. Exercise 6 – auto-instrumentation
          1. Open source instrumentation in Go
          2. Auto-instrumentation in Java
          3. Auto-instrumentation in Python
        9. Exercise 7 – extra credit
        10. Summary
        11. References
      2. 5. Instrumentation of Asynchronous Applications
        1. Prerequisites
          1. Project source code
          2. Java development environment
          3. Kafka, Zookeeper, Redis, and Jaeger
        2. The Tracing Talk chat application
          1. Implementation
            1. The lib module
              1. AppId
              2. Message
              3. KafkaConfig and KafkaService
              4. RedisConfig and RedisService
              5. GiphyService
            2. The chat-api service
            3. The storage-service microservice
            4. The giphy-service microservice
          2. Running the application
          3. Observing traces
        3. Instrumenting with OpenTracing
          1. Spring instrumentation
          2. Tracer resolver
          3. Redis instrumentation
          4. Kafka instrumentation
            1. Producing messages
            2. Consuming messages
        4. Instrumenting asynchronous code
        5. Summary
        6. References
      3. 6. Tracing Standards and Ecosystem
        1. Styles of instrumentation
        2. Anatomy of tracing deployment and interoperability
        3. Five shades of tracing
        4. Know your audience
        5. The ecosystem
          1. Tracing systems
            1. Zipkin and OpenZipkin
            2. Jaeger
            3. SkyWalking
          2. X-Ray, Stackdriver, and more
          3. Standards projects
            1. W3C Trace Context
            2. W3C "Data Interchange Format"
            3. OpenCensus
            4. OpenTracing
        6. Summary
        7. References
      4. 7. Tracing with Service Meshes
        1. Service meshes
        2. Observability via a service mesh
        3. Prerequisites
          1. Project source code
          2. Java development environment
          3. Kubernetes
          4. Istio
        4. The Hello application
        5. Distributed tracing with Istio
        6. Using Istio to generate a service graph
        7. Distributed context and routing
        8. Summary
        9. References
      5. 8. All About Sampling
        1. Head-based consistent sampling
          1. Probabilistic sampling
          2. Rate limiting sampling
          3. Guaranteed-throughput probabilistic sampling
          4. Adaptive sampling
            1. Local adaptive sampling
            2. Global adaptive sampling
              1. Goals
              2. Theory
              3. Architecture
              4. Calculating sampling probability
            3. Implications of adaptive sampling
            4. Extensions
          5. Context-sensitive sampling
          6. Ad-hoc or debug sampling
          7. How to deal with oversampling
            1. Post-collection down-sampling
            2. Throttling
        2. Tail-based consistent sampling
        3. Partial sampling
        4. Summary
        5. References
    7. III. Getting Value from Tracing
      1. 9. Turning the Lights On
        1. Tracing as a knowledge base
        2. Service graphs
          1. Deep, path-aware service graphs
          2. Detecting architectural problems
        3. Performance analysis
          1. Critical path analysis
          2. Recognizing trace patterns
            1. Look for error markers
            2. Look for the longest span on the critical path
            3. Look out for missing details
            4. Avoid sequential execution or "staircase"
            5. Be wary when things finish at exactly the same time
          3. Exemplars
          4. Latency histograms
        4. Long-term profiling
        5. Summary
        6. References
      2. 10. Distributed Context Propagation
        1. Brown Tracing Plane
        2. Pivot tracing
        3. Chaos engineering
        4. Traffic labeling
          1. Testing in production
          2. Debugging in production
          3. Developing in production
        5. Summary
        6. References
      3. 11. Integration with Metrics and Logs
        1. Three pillars of observability
        2. Prerequisites
          1. Project source code
          2. Java development environment
          3. Running the servers in Docker
          4. Declaring index pattern in Kibana
          5. Running the clients
        3. The Hello application
        4. Integration with metrics
          1. Standard metrics via tracing instrumentation
          2. Adding context to metrics
          3. Context-aware metrics APIs
        5. Integration with logs
          1. Structured logging
          2. Correlating logs with trace context
          3. Context-aware logging APIs
          4. Capturing logs in the tracing system
          5. Do we need separate logging and tracing backends?
        6. Summary
        7. References
      4. 12. Gathering Insights with Data Mining
        1. Feature extraction
        2. Components of a data mining pipeline
          1. Tracing backend
          2. Trace completion trigger
          3. Feature extractor
          4. Aggregator
        3. Feature extraction exercise
        4. Prerequisites
          1. Project source code
          2. Running the servers in Docker
          3. Defining index mapping in Elasticsearch
          4. Java development environment
          5. Microservices simulator
            1. Running as a Docker image
            2. Running from source
            3. Verify
          6. Define an index pattern in Kibana
        5. The Span Count job
          1. Trace completion trigger
          2. Feature extractor
        6. Observing trends
          1. Beware of extrapolations
        7. Historical analysis
        8. Ad hoc analysis
        9. Summary
        10. References
    8. IV. Deploying and Operating Tracing Infrastructure
      1. 13. Implementing Tracing in Large Organizations
        1. Why is it hard to deploy tracing instrumentation?
        2. Reduce the barrier to adoption
          1. Standard frameworks
          2. In-house adapter libraries
          3. Tracing enabled by default
          4. Monorepos
          5. Integration with existing infrastructure
        3. Where to start
        4. Building the culture
          1. Explaining the value
          2. Integrating with developer workflows
        5. Tracing Quality Metrics
        6. Troubleshooting guide
        7. Don't be on the critical path
        8. Summary
        9. References
      2. 14. Under the Hood of a Distributed Tracing System
        1. Why host your own?
          1. Customizations and integrations
          2. Bandwidth cost
          3. Own the data
        2. Bet on emerging standards
        3. Architecture and deployment modes
          1. Basic architecture: agent + collector + query service
            1. Client
            2. Agent
            3. Collector
            4. Query service and UI
            5. Data mining jobs
          2. Streaming architecture
          3. Multi-tenancy
            1. Cost accounting
            2. Complete isolation
            3. Granular access controls
          4. Security
          5. Running in multiple DCs
            1. Capturing origin zone
            2. Cross-zone federation
        4. Monitoring and troubleshooting
        5. Resiliency
          1. Over-sampling
          2. Debug traces
          3. Traffic spikes due to DC failover
          4. Perpetual traces
          5. Very long traces
        6. Summary
        7. References
      3. 15. Afterword
        1. References
    9. Other Books You May Enjoy
    10. Leave a review - let other readers know what you think
    11. Index

Product information

  • Title: Mastering Distributed Tracing
  • Author(s): Yuri Shkuro
  • Release date: February 2019
  • Publisher(s): Packt Publishing
  • ISBN: 9781788628464