Chapter 5. Working with Third Parties Shouldn’t Suck

Over the years, the definition of Site Reliability Engineering (SRE) has evolved, but the easiest to digest is subjectively “what happens when software engineering is tasked with what used to be called ‘operations.’”1 Most Site Reliability teams consider operations as the applications running on their own infrastructure. These days, more and more companies rely on third parties to serve a very specific function in which they specialize. This includes things like Domain Name System (DNS), Content Delivery Network (CDN), Application Performance Management (APM), Storage, Payments, Email, Messaging (SMS), Security (such as Single Sign-On [SSO] or Two-Factor Authentication [2FA]), Log Processing, and more. Any one of these resources, if not implemented properly, is a dependency that has the capacity to bring down your site.

Are vendors black boxes that we don’t control? Not necessarily. As we approach working with vendors, it’s important that we apply the same suite of SRE disciplines to working with third parties in an effort to make it suck less.

Build, Buy, or Adopt?

Before we dive into the topic of working with vendors, we should discuss the decisions that would lead us to buy over build or adopt. Our level of involvement in this process will depend on the combination of importance and stakeholders. Determining importance is the first step in this entire process and will ...

Get Seeking SRE now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.