A Case Study in Community-Driven Software Adoption

And I saw Sisyphus at his endless task raising his prodigious stone with both his hands...and the sweat ran off him and the steam rose after him.

Homer, The Odyssey, Book XI

Google has had a difficult time getting Site Reliability Engineering (SRE) teams to standardize on tools and processes. We’re not unique in this regard. When an SRE organization reaches a certain size, two things happen: standardization begins to matter, and you discover it’s difficult.

Over a 10-year period, Google’s SRE organization tried to get all of its teams onto one standard rollout tool, chosen from the dozens of internal tools that were available. The organization tried different approaches, designating a different tool as the “standard” year after year, with varying degrees of success.

Meanwhile, Sisyphus—a community-driven rollout tool that the organization didn’t officially support—quietly became the de facto standard. Not only did Sisyphus have no leadership support, it was actively discouraged. Sisyphus’s creators themselves said its code quality was poor and its development had been bewilderingly chaotic. It had no schedule, no funded staff, no project management, and the design documentation was written months after the tool was operational. Yet after nine years of the dozens of different automated rollout tools developed at Google, Sisyphus was used by virtually all SRE teams and many developer teams.

As part of an internal history of automation ...

Get A Case Study in Community-Driven Software Adoption now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.