Skip to Content
Site Reliability Engineering Essentials
on-demand course

Site Reliability Engineering Essentials

with Karun Subramanian
January 2025
Intermediate
4h 9m
English
Pearson
Closed Captioning available in English

Overview

4+ Hours of Video Instruction

Master the essentials of Site Reliability Engineering to effectively manage production systems with real-world insights and techniques.

Unlock the power of Site Reliability Engineering (SRE) with this comprehensive video course. SRE is a critical discipline that combines software engineering with IT operations to ensure high system reliability, scalability, and performance. This course provides a deep dive into the core principles and practices of SRE, equipping you with the tools to build reliable systems and improve operational efficiency.

The course covers key SRE concepts, includingService Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets, with practical examples that help you apply these principles to your own organization. You will learn how to build and optimize a robust monitoring and observability system using essential telemetry data, such as logs, metrics, and traces. Through an in-depth exploration of observability platforms, you will learn how to effectively monitor and maintain system health.

The course also addresses crucial aspects of incident management, such as managing on-call duties, running war rooms for critical incidents, and conducting blameless postmortems to learn from failures. Gain insights into reliable system architecture patterns, such as load balancing, auto-scaling, and the CAP theorem, to ensure your infrastructure remains resilient under high traffic.

Additionally, you will discover release management strategies that minimize user impact during deployments, monitor your CI/CD pipeline, and ensure progressive rollouts. The course also guides you through implementing SRE practices within your organization, including setting up a central SRE team and conducting production readiness reviews to ensure your systems are always production ready.

By the end of this course, you will have a solid understanding of SRE best practices and the knowledge to enhance the reliability and scalability of your systems while reducing downtime and improving overall operational efficiency.

Learn How To:

  • Set a strong foundation by implementing core Site Reliability Engineering (SRE) principles to ensure system reliability and performance.
  • Build and optimize a robust monitoring and observability system using essential telemetry data such as logs, metrics, and traces.
  • Monitor system health effectively through observability platforms to maintain optimal system performance.
  • Apply Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets to improve system reliability and performance.
  • Manage incidents effectively, run war rooms for critical situations, and conduct blameless postmortems to learn from failures.
  • Design reliable system architectures, including load balancing, auto-scaling, and implementing the CAP theorem for system resilience.
  • Minimize user impact during software deployments by using release management strategies and ensuring progressive rollouts.
  • Monitor your CI/CD pipeline to detect issues early and ensure smooth, efficient deployments.
  • Implement SRE practices within your organization, including setting up a central SRE team and conducting Production Readiness Reviews to ensure systems are always production ready.

Who Should Take This Course:

This course is designed for Site Reliability Engineers, DevOps engineers, application support engineers, software engineers and architects, as well as managers and directors of software engineering teams.

About the Instructor

Karun Subramanian is an IT operations expert focusing on modernizing monitoring and observability. With more than 20 years of experience, Karun has helped numerous companies transform their IT operations ecosystem. His expertise includes log aggregation, time series databases, cloud infrastructure, and machine data analytics. He is a Splunk Certified Architect. Karun is the author of the book Practical Splunk Search Processing Language: A Guide for Mastering SPL Commands for Maximum Efficiency and Outcome.

Additional books and courses from Karun Subramanian on O’Reilly.com:

Books:

Videos:

Live Events:

About Pearson Video Training:

Pearson publishes expert-led video tutorials covering a wide selection of technology topics designed to teach you the skills you need to succeed. These professional and personal technology videos feature world-leading author instructors published by your trusted technology brands: Addison-Wesley, Cisco Press, Pearson IT Certification, Prentice Hall, Sams, and Que Topics include: IT Certification, Network Security, Cisco Technology, Programming, Web Development, Mobile Development, and more. Learn more about Pearson Video training at http://www.informit.com/video.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Watch now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Site Reliability Engineering

Site Reliability Engineering

Niall Richard Murphy, Betsy Beyer, Chris Jones, Jennifer Petoff
The Site Reliability Workbook

The Site Reliability Workbook

Betsy Beyer, Niall Richard Murphy, David K. Rensin, Kent Kawahara, Stephen Thorne
Building Secure and Reliable Systems

Building Secure and Reliable Systems

Heather Adkins, Betsy Beyer, Paul Blankinship, Piotr Lewandowski, Ana Oprea, Adam Stubblefield

Publisher Resources

ISBN: 9780135415016