SRE with Java Microservices

Book description

In a microservices architecture, the whole is indeed greater than the sum of its parts. But in practice, individual microservices can inadvertently impact others and alter the end user experience. Effective microservices architectures require standardization on an organizational level with the help of a platform engineering team.

This practical book provides a series of progressive steps that platform engineers can apply technically and organizationally to achieve highly resilient Java applications. Author Jonathan Schneider covers many effective SRE practices from companies leading the way in microservices adoption. You’ll examine several patterns discovered through much trial and error in recent years, complete with Java code examples.

Chapters are organized according to specific patterns, including:

  • Application metrics: Monitoring for availability with Micrometer
  • Debugging with observability: Logging and distributed tracing; failure injection testing
  • Charting and alerting: Building effective charts; KPIs for Java microservices
  • Safe multicloud delivery: Spinnaker, deployment strategies, and automated canary analysis
  • Source code observability: Dependency management, API utilization, and end-to-end asset inventory
  • Traffic management: Concurrency of systems; platform, gateway, and client-side load balancing

Publisher resources

View/Submit Errata

Table of contents

  1. Foreword
  2. Preface
    1. My Journey
    2. Conventions Used in This Book
    3. O’Reilly Online Learning
    4. How to Contact Us
    5. Acknowledgments
  3. 1. The Application Platform
    1. Platform Engineering Culture
    2. Monitoring
      1. Monitoring for Availability
      2. Monitoring as a Debugging Tool
      3. Learning to Expect Failure
      4. Effective Monitoring Builds Trust
    3. Delivery
    4. Traffic Management
    5. Capabilities Not Covered
      1. Testing Automation
      2. Chaos Engineering and Continuous Verification
      3. Configuration as Code
    6. Encapsulating Capabilities
      1. Service Mesh
    7. Summary
  4. 2. Application Metrics
    1. Black Box Versus White Box Monitoring
    2. Dimensional Metrics
    3. Hierarchical Metrics
    4. Micrometer Meter Registries
    5. Creating Meters
    6. Naming Metrics
      1. Common Tags
    7. Classes of Meters
    8. Gauges
    9. Counters
    10. Timers
      1. “Count” Means “Throughput”
      2. “Count” and “Sum” Together Mean “Aggregable Average”
      3. Maximum Is a Decaying Signal That Isn’t Aligned to the Push Interval
      4. The Sum of Sum Over an Interval
      5. The Base Unit of Time
      6. Using Timers
      7. Common Features of Latency Distributions
      8. Percentiles/Quantiles
      9. Histograms
      10. Service Level Objective Boundaries
    11. Distribution Summaries
    12. Long Task Timers
    13. Choosing the Right Meter Type
    14. Controlling Cost
    15. Coordinated Omission
    16. Load Testing
    17. Meter Filters
      1. Deny/Accept Meters
      2. Transforming Metrics
      3. Configuring Distribution Statistics
    18. Separating Platform and Application Metrics
    19. Partitioning Metrics by Monitoring System
    20. Meter Binders
    21. Summary
  5. 3. Debugging with Observability
    1. The Three Pillars of Observability…or Is It Two?
      1. Logs
      2. Distributed Tracing
      3. Metrics
      4. Which Telemetry Is Appropriate?
    2. Components of a Distributed Trace
    3. Types of Distributed Tracing Instrumentation
      1. Manual Tracing
      2. Agent Tracing
      3. Framework Tracing
      4. Service Mesh Tracing
      5. Blended Tracing
    4. Sampling
      1. No Sampling
      2. Rate-Limiting Samplers
      3. Probabilistic Samplers
      4. Boundary Sampling
      5. Impact of Sampling on Anomaly Detection
    5. Distributed Tracing and Monoliths
    6. Correlation of Telemetry
      1. Metric to Trace Correlation
    7. Using Trace Context for Failure Injection and Experimentation
    8. Summary
  6. 4. Charting and Alerting
    1. Differences in Monitoring Systems
    2. Effective Visualizations of Service Level Indicators
      1. Styles for Line Width and Shading
      2. Errors Versus Successes
      3. “Top k” Visualizations
      4. Prometheus Rate Interval Selection
    3. Gauges
    4. Counters
    5. Timers
    6. When to Stop Creating Dashboards
    7. Service Level Indicators for Every Java Microservice
      1. Errors
      2. Latency
      3. Garbage Collection Pause Times
      4. Heap Utilization
      5. CPU Utilization
      6. File Descriptors
      7. Suspicious Traffic
      8. Batch Runs or Other Long-Running Tasks
    8. Building Alerts Using Forecasting Methods
      1. Naive Method
      2. Single-Exponential Smoothing
      3. Universal Scalability Law
    9. Summary
  7. 5. Safe, Multicloud Continuous Delivery
    1. Types of Platforms
    2. Resource Types
    3. Delivery Pipelines
    4. Packaging for the Cloud
      1. Packaging for IaaS Platforms
      2. Packaging for Container Schedulers
    5. The Delete + None Deployment
    6. The Highlander
    7. Blue/Green Deployment
    8. Automated Canary Analysis
      1. Spinnaker with Kayenta
      2. General-Purpose Canary Metrics for Every Microservice
    9. Summary
  8. 6. Source Code Observability
    1. The Stateful Asset Inventory
    2. Release Versioning
      1. Maven Repositories
      2. Build Tools for Release Versioning
    3. Capturing Resolved Dependencies in Metadata
    4. Capturing Method-Level Utilization of the Source Code
      1. Structured Code Search with OpenRewrite
    5. Dependency Management
      1. Version Misalignments
      2. Dynamic Version Constraints
      3. Unused Dependencies
      4. Undeclared Explicitly Used Dependencies
    6. Summary
  9. 7. Traffic Management
    1. Microservices Offer More Potential Failure Points
    2. Concurrency of Systems
    3. Platform Load Balancing
    4. Gateway Load Balancing
      1. Join the Shortest Queue
      2. Instance-Reported Availability and Utilization
      3. Health Checks
      4. Choice of Two
      5. Instance Probation
      6. Knock-On Effects of Smarter Load Balancing
    5. Client-Side Load Balancing
    6. Hedge Requests
    7. Call Resiliency Patterns
      1. Retries
      2. Rate Limiters
      3. Bulkheads
      4. Circuit Breakers
      5. Adaptive Concurrency Limits
      6. Choosing the Right Call Resiliency Pattern
      7. Implementation in Service Mesh
      8. Implementation in RSocket
    8. Summary
  10. Index

Product information

  • Title: SRE with Java Microservices
  • Author(s): Jonathan Schneider
  • Release date: September 2020
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492073925