A philosophy of duct-tape outage resolution
The latest ideas, practices, and trends for improving your business through operations.
Shaping jobs for service efficiency in shared computing environments
Best Practices for Site Reliability Engineers
Tracy Ferrell and Phil Beevers on the principles of Site Reliability Engineering and successful SRE teams.
Get a basic understanding of serverless, then go deeper with recommended resources.
Automated pet herding for fun and profit.
Every business that relies on technology is facing an infrastructure upheaval.
Java is ready for the cloud today.
Get a basic understanding of Kubernetes and then go deeper with recommended resources.
Tim Craig and Gustavo Franco on establishing robust and well-supported incident response processes.
Google SRE Stephen Thorne shares best practices for starting an SRE team at your company.
How SREs can use a hierarchy for mature alerts.
Organizations that want all of the speed, agility, and savings the cloud provides are embracing a cloud native approach.
Our most-used AWS resources will help you stay on track in your journey to learn and apply AWS.
Get a basic understanding of distributed systems and then go deeper with recommended resources.
O’Reilly’s new survey reveals the latest operations salary trends, and the skill sets that will keep your operations career on track.
This collection of serverless resources will get you up to speed on the basics and best practices.
A new report examines the state of infrastructure and anticipated near-term developments through the eyes of infrastructure experts.
Kris Beevers examines the trade-offs between risk and velocity faced by any high-growth, critical path technology business.
Dave Rensin explains why DevOps and SRE make each other better.
Laurent Gil shares the latest cybersecurity research findings based on real-world security operations.
Using advanced Docker Compose features to solve problems in larger projects and teams.
Poll results reveal where and why organizations choose to use containers, cloud platforms, and data pipelines.
Get a basic understanding of site reliability engineering (SRE) and then go deeper with recommended resources.