Chapter 2. Case Study 1: Moonshot
In this chapter, we discuss a large-scale project called Moonshot. We share several examples of tools, processes, and techniques that pushed the project forward and conclude with a postmortem of lessons learned from the project.
Overview
In 2010, the senior Storage SRE leadership declared that the Moonshot project would soon be underway. This project required teams to migrate all of the company’s systems from GFS1 to its successor, Colossus, by the end of 2011. At the time, Colossus was still in prototype, and this migration was the largest data migration in the history of Google. This mandate was so ambitious that people dubbed the project Moonshot. As an internal newsletter to engineers put it:
If migrating all of our data in 2010 still sounds like a pretty aggressive schedule, well, yes it is! Will there be problems such as minor outages? Probably. However, the Storage teams and our senior VPs believe that it’s worth the effort and occasional hiccup, and there are plenty of incentives for early adopters, including reduced quota costs, better performance, and lots of friendly SRE support.
The initial communication completely undersold the effort, complexity, and difficulty of this project. In reality, it took a full two years to migrate all of Google’s services from GFS to Colossus.
GFS was designed in 2001 as Google’s first cluster-level file system. It supported many petabytes of storage and allowed thousands of servers to interact ...
Get Case Studies in Infrastructure Change Management now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.