Skip to Content
Google SRE工作手册
book

Google SRE工作手册

by Betsy Beyer, Niall Richard Murphy, David K. Rensin, Kent Kawahara, Stephen Thorne
September 2020
Intermediate to advanced
526 pages
8h 23m
Chinese
China Electric Power Press Ltd.
Content preview from Google SRE工作手册
基于
SLO
的告警
111
极端的可用性目标
具有极低或极高可用性目标的服务可能需要特别的处理。比如,我们考虑一个可用
性目标为
90%
的服务。如表
5-8
所示,当服务在一小时里消耗了错误预算的
2%
时,
就会触发告警。但由于
100%
的故障仅占那个小时预算的
1.4%
,因此这一告警永远
不会触发。如果你的错误预算是针对较长的时间段设置的,那就可能需要调整告警
参数。
对于具有极高可用性目标的服务,
100%
的故障会在非常短的时间里耗尽预算。对于
月度可用性目标为
99.999%
的服务,
100%
的故障将在
26
秒内耗尽预算。这比许多监
控服务的指标采集周期都要短,更不用说触发告警的端到端用时了,因为告警还要
通过电子邮件、通过短信等通知系统发出。即使告警可以直接发送到一个能自动解
决问题的系统,也很可能在问题得到缓解之前,错误预算就已经完全耗尽了。
从收到告警通知的那一刻算起,你只剩下
26
秒的预算了,这并不一定是一个糟糕的
策略,但它对于你捍卫
SLO
并没有帮助。想要实现这样高的可靠性的唯一方法,是
设计一个发生
100%
故障的可能性极低的系统。这样,你就有可能在耗尽预算之前解
决问题。例如,如果你总是先将变更发布到
1%
的用户,那你的燃烧率就降低为原先
1%
,于是现在需要
43
分钟才能耗尽预算。有关设计此类系统的策略,请参阅第
16
章。
扩大告警范围
当你扩展服务时,请确保告警策略也能相应的得到扩展。你可能会想为每个服务都
设定其告警参数,但是,你的服务要是包含了
100
个微服务(或者是一个包含
100
不同请求类型的服务),那么大量的琐事和各种特殊情况的具体分析工作很快就会 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Python数据分析(第2版)

Python数据分析(第2版)

Posts & Telecom Press, Armando Fandango
Google系统架构解密: 构建安全可靠的系统

Google系统架构解密: 构建安全可靠的系统

Heather Adkins, Betsy Beyer, Paul Blankinship, Piotr Lewandowski, Ana Oprea, Adam Stubblefield
编写整洁的Python代码(第2版)

编写整洁的Python代码(第2版)

Posts & Telecom Press, Mariano Anaya

Publisher Resources

ISBN: 9787519845858