Skip to Content
Google SRE工作手册
book

Google SRE工作手册

by Betsy Beyer, Niall Richard Murphy, David K. Rensin, Kent Kawahara, Stephen Thorne
September 2020
Intermediate to advanced
526 pages
8h 23m
Chinese
China Electric Power Press Ltd.
Content preview from Google SRE工作手册
216
9
在接下来的八个小时里,响应团队努力地尝试解决和缓解问题。当我们手册中的
工作流程也无法解决问题时,响应团队就开始有条不紊地尝试新的恢复方法。
与此同时,我们每四个小时轮换一次
on-call
工程师和事故总负责人。这样做的
目的是鼓励工程师们充分进行休息,并给响应团队带来新鲜想法。
上午
5:33
SRE
on-call
人员对
NTP
服务器进行了一个配置修改。
上午
6:13
经过事故总负责人和每个服务的
on-call
工程师的沟通,他们确认所有
的服务都已经恢复了正常。当确认工作完成后,事故总负责人关闭了电话会议系
统和
Slack
频道,并声明了本次事故已经处理完毕。鉴于
NTP
服务的广泛影响,
有必要撰写一份事后总结。在结束事故的相关工作之前,事故总负责人把撰写事
后总结的任务交给了
NTP
服务的
SRE
团队的
on-call
人员。
在事故响应中用到的工具
我们的事故响应流程主要依赖于以下三个工具:
PagerDuty
我们把所有的
on-call
信息、每个服务的负责人、事后总结、事故元数据等信息
都存储在
PagerDuty
里。这样在出现问题时,我们就能快速地找到事故相关的团队。
Slack
我们会维护一个专门的频道(
#incident-war-room
),作为各方专家和事故总负
责人集中探讨问题的地方。这个频道主要作用是信息中心,在这儿能找到操作记
录、相关负责人、时间戳等信息。
电话会议系统
on-call
工程师在应邀加入事故响应团队时,他们应当拨打一个固定的号码来参
加电话会议。我们在协调工作时,倾向于在电话会议里做出决策,然后把结果记 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Python数据分析(第2版)

Python数据分析(第2版)

Posts & Telecom Press, Armando Fandango
Google系统架构解密: 构建安全可靠的系统

Google系统架构解密: 构建安全可靠的系统

Heather Adkins, Betsy Beyer, Paul Blankinship, Piotr Lewandowski, Ana Oprea, Adam Stubblefield
编写整洁的Python代码(第2版)

编写整洁的Python代码(第2版)

Posts & Telecom Press, Mariano Anaya

Publisher Resources

ISBN: 9787519845858