Appendix B. Example Error Budget Policy
Status | Published |
---|---|
Author | Steven Thurgood |
Date | 2018-02-19 |
Reviewers | David Ferguson |
Approvers | Betsy Beyer |
Approval date | 2018-02-20 |
Revisit date | 2019-02-01 |
Service Overview
The Example Game Service allows Android and iPhone users to play a game with each other. New releases of the backend code are pushed daily. New releases of clients are pushed weekly. This policy applies both to backend and client releases.
Goals
The goals of this policy are to:
-
Protect customers from repeated SLO misses
-
Provide an incentive to balance reliability with other features
Non-Goals
This policy is not intended to serve as a punishment for missing SLOs. Halting change is undesirable; this policy gives teams permission to focus exclusively on reliability when data indicates that reliability is more important than other product features.
SLO Miss Policy
If the service is performing at or above its SLO, then releases (including data changes) will proceed according to the release policy.
If the service has exceeded its error budget for the preceding four-week window, we will halt all changes and releases other than P01 issues or security fixes until the service is back within its SLO.
Depending upon the cause of the SLO miss, the team may devote additional resources to working on reliability instead of feature work.
The team must work on reliability if:
-
A code bug or procedural error caused the service itself to exceed the error budget. ...
Get The Site Reliability Workbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.