Breaking into Site Reliability Engineering: Learn the Core Concepts and Best Practices of Successful SRE
with Swapnil Shevate
Overview
Site reliability Engineering (SRE) as the name suggests is the art of maintaining stability, quality and thus reliability of a service or a feature that an application has to offer its end users. Reliability of production systems is directly proportional to the revenue of your company and one of the most important factors for growing your business over time. Several organizations use SRE to ensure all critical applications they built remain available throughout their life-span, even in midst of season peaks, infrastructure maintenance, and planned or unplanned software updates. Site reliability Engineers are responsible for maintaining the uptime of these systems and need to possess several skills to achieve maximum availability/minimum downtime.
In this course, you will learn what it takes to be a modern SRE, with deep-dive into principles and core concepts. We will cover different aspects of observability (metrics, logging), monitoring, change management, SLOs, disaster recovery, Scale-up, tooling, troubleshooting and timeline based no-blame post mortems. At the end of this course, you will have a detailed understanding of who is an SRE, what exactly they do and what it takes to be a successful SRE.
What you’ll learn and how to apply it
- Understanding Site Reliability Engineering fundamentals and core principles
- Discuss topics of day-to-day use and inculcate those practices into your SRE routine
- Better understand important concepts that you may not find as part of your formal course or curriculum
- Gain an understanding of why and how an SRE plays a critical role an in their organization
This course is for you because
- You are working in IT and are looking for content to move into an SRE role.
- You’re an SRE or DevOps professional who wants to improve their overall understanding based on core principles.
- You’re an engineer who’s looking to understand the underlying principles of Site Reliability Engineering and expand your DevOps resources.
- You’re a recent college graduate looking to start a career as an effective SRE
Prerequisites
- Basic Understanding of Computer science core concepts like Software development life cycle (SDLC), System administration and Infrastructure
- Familiarity with Shell terminal
- Basic familiarity with Linux
Course Materials
GitHub repositoryRecommended follow-up: