Tim Craig and Gustavo Franco on establishing robust and well-supported incident response processes.
Drawing from technology, finance, sports, social psychology, and complexity theory, Everett Harper looks at the key practices that are crucial for solving our most critical challenges.
Bob Quillin outlines how the cloud native community can reduce complexity, be more inclusive to all teams, and create a more open, multicloud future.
Bridget Kromhout looks over the cloud native landscape and talks about what’s new, what’s next, and what you need to get started with Kubernetes right now.
Liz Fong-Jones says management of complex distributed systems requires changing who's involved in production, how they collaborate, and how success is measured.
Drawing inspiration from restorative justice practices and her own journey of healing, Alex Qin offers a hopeful vision for how we can come together and co-create the world we yearn for.
Yaniv Aknin dives into the secret sauce for a successful SRE organization: high-quality measurements of reliability.
Experts explore cloud native infrastructure, SRE, distributed systems, and more.
Modern distributed systems are immensely different from distributed systems of just a decade ago. Lena Hall looks at how our approaches and practices progress with time.
Jessica Kerr argues that most programming careers aren’t about writing software, they’re about changing it.
Lachlan Evenson and Bridget Kromhout discuss the journey to build Gatekeeper, a community-driven approach for enforcing policy on any Kubernetes cluster.
Chen Goldberg shares how Kubernetes, Istio, GKE, and Anthos can help build distributed systems and happy teams.
Google SRE Stephen Thorne shares best practices for starting an SRE team at your company.
How SREs can use a hierarchy for mature alerts.
Survey results reveal the path organizations face as they integrate cloud native infrastructure and harness the full power of the cloud.
Organizations that want all of the speed, agility, and savings the cloud provides are embracing a cloud native approach.
From artificial intelligence to serverless to Kubernetes, here’s what's on our radar.
Our most-used AWS resources will help you stay on track in your journey to learn and apply AWS.
Get a basic understanding of distributed systems and then go deeper with recommended resources.
Kris Nova looks at the new era of the cloud native space and the kernel that has made it all possible: Kubernetes.
Claire Janisch looks at some of the best biomimicry opportunities inspired by nature’s software and wetware.
Jane Adams examines the ways data-driven recruiting fails to achieve intended results and perpetuates discriminatory hiring practices.
Martin Kleppmann shows how recent computer science research is helping develop the abstractions and APIs for the next generation of applications.
Watch highlights from expert talks covering Kubernetes, chaos engineering, deep learning, and more.
Crystal Hirschorn discusses how organizations can benefit from combining established tech practices with incident planning, post-mortem-driven development, chaos engineering, and observability.
Omoju Miller outlines a vision where we harness human action for a better future.
Katrina Owen says the valuable skills that experienced professionals lack are at the vital margins of their careers.
Anne Currie says excessive and dirty energy use in data centers is one of the biggest ethical issues facing the tech industry.
O’Reilly’s new survey reveals the latest operations salary trends, and the skill sets that will keep your operations career on track.
This collection of serverless resources will get you up to speed on the basics and best practices.
A new report examines the state of infrastructure and anticipated near-term developments through the eyes of infrastructure experts.
Michael Bernstein offers an unflinching look at some of the fallacies that developers believe about marketing.
Laura Thomson shares Mozilla’s approach to data ethics, review, and stewardship.
Roger Magoulas shares insights from O'Reilly's online learning platform that point toward shifts in the systems engineering ecosystem.
Jaana Dogan explains why Google teaches its tracing tools to new employees and how it helps them learn about Google-scale systems end to end.
Tammy Butow explains how companies can use Chaos Days to focus on controlled chaos engineering.
Anil Dash asks: How could our processes and tools be designed to undo the biggest bugs and biases of today’s tech?
Laurent Gil shares the latest cybersecurity research findings based on real-world security operations.
Kris Beevers examines the trade-offs between risk and velocity faced by any high-growth, critical path technology business.
Francesc Campoy Flores explores ways machine learning can help developers be more efficient.
Jessica McKellar draws parallels between the free and open source software movement and the work to end mass incarceration.
Kavya Joshi says performance theory offers a rigorous and practical approach to performance tuning and capacity planning.
Watch highlights from expert talks covering DevOps, SRE, security, machine learning, and more.
Dave Rensin explains why DevOps and SRE make each other better.
Using advanced Docker Compose features to solve problems in larger projects and teams.
Poll results reveal where and why organizations choose to use containers, cloud platforms, and data pipelines.
Get a basic understanding of site reliability engineering (SRE) and then go deeper with recommended resources.
Achieve high-impact systems monitoring by focusing on latency, errors, throughput, utilization, and blackbox monitoring.
Get advice and insight from speakers who have tackled the challenges you face.
O’Reilly Media Podcast: George Miranda discusses the benefits and challenges of a service mesh, and the best ways to get started using one.
Learn why this new tool is a critical component in microservice-based architectures.
Dave Andrews explains how to wield the power of a global 50 Tbps application delivery network to ensure maximum availability during and after a change.
Julia Grace shares how she learned to rapidly scale herself and her leadership team during a period of hypergrowth at Slack.
David Hayes explains why adding a manageable dose of actionable intelligence to your operations management workflow can save you time and aggravation.
Oracle's Kyle York and Netra's Richard Lee discuss Netra’s high-performance computing environment.
Kyle Kingsbury explores anomalies in three distributed systems and shares strategies for correctness testing using Jepsen.
Bryan Liles explains how to evaluate and integrate new declarative application management practices into continuous integration pipelines.
Nicole Forsgren shares results and stories behind high-performing technology-driven teams and organizations.
Javier Garza details the ingredients you need to build and deliver an app your users will love.
Tamar Bercovici details how the team at Box has constructed its database stack to handle an ever-growing query load and data set.