An SRE is responsible for ensuring the systems availability, performance-monitoring, and incident response of the cloud IT platforms and services. SREs must make sure that all software applications entering production environments fully comply with a set of important requirements, such as diagrams, network topology illustrations, service dependency details, monitoring and logging plans, backups, and so on. A software application may fully comply with all of the functional requirements, but there are other sources for disruption and interruption. There may be hardware degradation, networking problems, high usage of resources, or slow responses from applications, and services could happen at any time. SREs always need ...
The importance of SREs
Get Practical Site Reliability Engineering now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.