Practical Site Reliability Engineering
by Pethuru Raj Chelliah, Shreyash Naithani, Shailender Singh
The importance of SREs
An SRE is responsible for ensuring the systems availability, performance-monitoring, and incident response of the cloud IT platforms and services. SREs must make sure that all software applications entering production environments fully comply with a set of important requirements, such as diagrams, network topology illustrations, service dependency details, monitoring and logging plans, backups, and so on. A software application may fully comply with all of the functional requirements, but there are other sources for disruption and interruption. There may be hardware degradation, networking problems, high usage of resources, or slow responses from applications, and services could happen at any time. SREs always need ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access