book

SRE with Java Microservices

Name: SRE with Java Microservices
Author: Jonathan Schneider
ISBN: 9781492073925

by Jonathan Schneider

September 2020

Intermediate to advanced

314 pages

8h 22m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Foreword
Preface
My JourneyConventions Used in This BookO’Reilly Online LearningHow to Contact UsAcknowledgments
1. The Application Platform
Platform Engineering CultureMonitoringMonitoring for AvailabilityMonitoring as a Debugging ToolLearning to Expect FailureEffective Monitoring Builds TrustDeliveryTraffic ManagementCapabilities Not CoveredTesting AutomationChaos Engineering and Continuous VerificationConfiguration as CodeEncapsulating CapabilitiesService MeshSummary
2. Application Metrics
Black Box Versus White Box MonitoringDimensional MetricsHierarchical MetricsMicrometer Meter RegistriesCreating MetersNaming MetricsCommon TagsClasses of MetersGaugesCountersTimers“Count” Means “Throughput”“Count” and “Sum” Together Mean “Aggregable Average”Maximum Is a Decaying Signal That Isn’t Aligned to the Push IntervalThe Sum of Sum Over an IntervalThe Base Unit of TimeUsing TimersCommon Features of Latency DistributionsPercentiles/QuantilesHistogramsService Level Objective BoundariesDistribution SummariesLong Task TimersChoosing the Right Meter TypeControlling CostCoordinated OmissionLoad TestingMeter FiltersDeny/Accept MetersTransforming MetricsConfiguring Distribution StatisticsSeparating Platform and Application MetricsPartitioning Metrics by Monitoring SystemMeter BindersSummary
3. Debugging with Observability
The Three Pillars of Observability…or Is It Two?LogsDistributed TracingMetricsWhich Telemetry Is Appropriate?Components of a Distributed TraceTypes of Distributed Tracing InstrumentationManual TracingAgent TracingFramework TracingService Mesh TracingBlended TracingSamplingNo SamplingRate-Limiting SamplersProbabilistic SamplersBoundary SamplingImpact of Sampling on Anomaly DetectionDistributed Tracing and MonolithsCorrelation of TelemetryMetric to Trace CorrelationUsing Trace Context for Failure Injection and ExperimentationSummary
4. Charting and Alerting
Differences in Monitoring SystemsEffective Visualizations of Service Level IndicatorsStyles for Line Width and ShadingErrors Versus Successes“Top k” VisualizationsPrometheus Rate Interval SelectionGaugesCountersTimersWhen to Stop Creating DashboardsService Level Indicators for Every Java MicroserviceErrorsLatencyGarbage Collection Pause TimesHeap UtilizationCPU UtilizationFile DescriptorsSuspicious TrafficBatch Runs or Other Long-Running TasksBuilding Alerts Using Forecasting MethodsNaive MethodSingle-Exponential SmoothingUniversal Scalability LawSummary
5. Safe, Multicloud Continuous Delivery
Types of PlatformsResource TypesDelivery PipelinesPackaging for the CloudPackaging for IaaS PlatformsPackaging for Container SchedulersThe Delete + None DeploymentThe HighlanderBlue/Green DeploymentAutomated Canary AnalysisSpinnaker with KayentaGeneral-Purpose Canary Metrics for Every MicroserviceSummary
6. Source Code Observability
The Stateful Asset InventoryRelease VersioningMaven RepositoriesBuild Tools for Release VersioningCapturing Resolved Dependencies in MetadataCapturing Method-Level Utilization of the Source CodeStructured Code Search with OpenRewriteDependency ManagementVersion MisalignmentsDynamic Version ConstraintsUnused DependenciesUndeclared Explicitly Used DependenciesSummary
7. Traffic Management
Microservices Offer More Potential Failure PointsConcurrency of SystemsPlatform Load BalancingGateway Load BalancingJoin the Shortest QueueInstance-Reported Availability and UtilizationHealth ChecksChoice of TwoInstance ProbationKnock-On Effects of Smarter Load BalancingClient-Side Load BalancingHedge RequestsCall Resiliency PatternsRetriesRate LimitersBulkheadsCircuit BreakersAdaptive Concurrency LimitsChoosing the Right Call Resiliency PatternImplementation in Service MeshImplementation in RSocketSummary
Index

Content preview from SRE with Java Microservices

Chapter 4. Charting and Alerting

Monitoring doesn’t have to be an all-in proposition. If you only add a measure of error ratio for end-user interactions where you have no monitoring (or only resource monitoring like CPU/memory utilization), you’ve already taken a huge step forward in terms of understanding your software. After all, CPU and memory can look good but a user-facing API is failing 5% of all requests, and failure rate is a much easier idea to communicate between engineering organizations and their business partners.

While Chapters 2 and 3 covered different forms of monitoring instrumentation, here we present the ways we can use that data effectively to promote action via alerting and visualization. This chapter covers three main topics.

First, we should think about what makes for a good visualization of an SLI. We’re only going to show charts from the commonly used Grafana charting and alerting tool, because it is a freely available open source tool that has datasource plug-ins for many different monitoring systems (so learning a little Grafana is a largely transferable skill from one monitoring system to another). Many of the same suggestions apply to charting solutions integrated into vendor products.

Next, we’ll discuss specifics about the measurements that generate the most value and how to visualize and alert on them. Treat these as a checklist of SLIs that you can add incrementally. Incrementalism may even be preferable to implementing them all at once, because ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781492073918Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

SRE with Java Microservices

by Jonathan Schneider

Chapter 4. Charting and Alerting

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.