Chapter 1. Shifting the Security Team to a DevOps Mindset
During the early days of the shift away from Waterfall development, I was incredibly fortunate to be in the position of building the security team at Etsy while it was one of the first companies pioneering DevOps. At the time, for most companies, production application changes were typically made every 6 to 18 months. However, as I learned on my first day as head of security, Etsy was making production code deployments 20 times per day and rising. As you can imagine—and I had to learn the hard way—most of the classic approaches to security simply weren’t going to survive in this environment.
I knew that the approach to security needed to change, but implementing a DevOps-friendly model effectively and in ways that achieved buy-in throughout the organization hadn’t really been done before. First and foremost, like many security professionals, I had to stop thinking of security as a gatekeeper or blocker, which is a holdover from the Waterfall methodologies. As my thinking changed, I could begin seeing how the security team could change to be more DevOps friendly and thus maintain good security practices while focusing on enabling business agility.
In this report for fellow security leaders owning a security transformation, I share the lessons I learned about building and scaling a program along the way—lessons that would have saved me from a bunch of pain had I known them from the beginning. I use details about Etsy because that’s the environment I know best. However, none of these lessons learned are meant to be specific to Etsy and should apply to any company going through the often intertwined journey to DevOps and the cloud. In most large organizations I’ve spoken with, at a high level “DevOps” and “cloud” often describe the same larger journey of modernizing their development and delivery process out of the Waterfall and datacenter world. Obviously, these terms can have very specific technical meanings, as well; however, for the high-level purposes of this report, I use them both in combination and occasionally interchangeably to refer to this larger modernization journey.
At first, only the most leading-edge tech companies seemed likely to embrace DevOps and the cloud. However, the benefits to the business have been so tremendous that now even the largest Fortune 100 companies are making the jump. Indeed, digital transformation is one of their largest strategic priorities.
How DevOps and the Cloud Change the Challenges Security Teams Face
When building or modernizing a security team, your starting point is important. Instead of looking at building a team from a checklist perspective, I focus on adapting a security team to the way that technology development and deployment are actually achieved within the company.
When you look at building a team from this perspective, you need to consider three main ways in which technology and organizations are changing:
Whereas application and infrastructure changes used to be a very long, drawn-out process taking months or even years, these changes now happen weekly, daily, or even in the span of minutes.
As development and operations functions increasingly merge into DevOps roles, significantly more people typically have access to production.
Risk has shifted from the infrastructure layer up to the application layer, and for many groups of attackers, the cost of attack has dropped to almost nothing.
These changes affect security at all layers, from application security to network security. For application security, you need to focus on providing developers with the visibility and tooling so that they can own the security of the applications and services that they control. From the operations (or network security) side, the focus shifts from blocking access outright to providing visibility and alerting coverage that identifies when any of that access is being abused (or has been compromised by an attacker). Finally, the lens through which you look at both of these efforts is the Attack-Driven Defense model. In this model, you make sure your defensive efforts align with how your attackers are likely to actually attack you.
The Problems with Waterfall
To understand how a DevOps-friendly model for your security team can work, it helps to understand the roots of our thinking about security. In the old Waterfall method of development and deployment, companies deployed code in a slow, linear manner, sort of like going from point A to point B along the Oregon Trail.1 Developers loaded up their wagon with a number of grandfather clocks, which were sort of like their code, as illustrated in Figure 1-1. Then, they started down the trail and at certain points had to ford the QA (quality assurance) river, hunt for bugs, or things like that. Next they fed the code back into staging and maybe it survived and reached the final destination (going live on production). Maybe not.
In a Waterfall model, code can take weeks, months, or even years before it makes it to production. However, because businesses want to be able to iterate faster than that to deliver new products and features, these timeframes have become unacceptable. Just as the business is adapting, your security team will need to adapt to these new timeframes, as well.
Developing and Iterating on Production: A Perspective Shift
As code-deployment cycles have shrunk from years to months, to weeks, to sometimes even days or hours, the way in which these cycles picked up speed is important. Etsy pushed code deployments to production 20 times per day on average, and each one of those deployments could contain change sets from numerous developers. To pick up speed, code stopped moving from one group to another group to another group, with every group signing off on the code and providing feedback.
Instead, developers wrote some code and then push it to production. Each developer has become their own QA department, security department, and performance team. Developers own that code deployment into production. That shift is huge.
This shift doesn’t mean that untested code goes live to users as soon as a developer thinks it’s ready. You still need a process to ensure the code doesn’t introduce issues. In a DevOps model, the techniques used include feature flags, ramp ups, and A/B testing (all discussed in more detail a little later) to test the code and slowly release new features to users.
But from the security perspective, because so much more is now done on production, someone could still access feature code that’s not production-ready by simply adding some flag in the URL. So, coming from a security perspective, I was initially deeply skeptical of (if not downright horrified by) the idea of developers owning the entire process and iterating potentially untested code on production. In fact, in Figure 1-2, I’ve included a picture of my face on day one of landing in this DevOps environment.
I was skeptical because, historically, we as security professionals have always thought of security in terms of control. And back in the days of Waterfall, things like sign-off made sense. The thinking was, “Okay, you’ll do the security review and then get the okay that the code is safe.” But the world has changed, and just like any sort of world-changing journey, this shift to DevOps felt really scary at first. For anyone coming from Waterfall, of course it’s scary having all this code on production before the security team ever sees that code.
However, if done right, this DevOps system isn’t nearly as dangerous as I first worried about; it’s actually the opposite. So now I write books and dispense the Kool-Aid about this because I finally drank some of it myself. Why did I change my thinking? After an embarrassingly long time, I realized this fundamental truth: No matter which development methodology you have, vulnerabilities occur in all of them. But, with its focus on allowing frequent deployments to production, only DevOps gives you the speed to react to those vulnerabilities faster than your attackers.
Focusing on Mean Time to Reaction
Every practical software development methodology will result in vulnerabilities. The thought that “Oh, security needs to go sign off on all this and then it will be magically safe,” isn’t necessarily true. We still ship code that has bugs in it all the time. So, instead of focusing on sign-off and control, a security team in a DevOps environment needs to focus on the ability to respond quickly. The security team needs to reduce that time from weeks to days to hours to minutes just as the developers have reduced deployment time.
In my experience working in Waterfall methodologies, the SysOps team deploys code to production after a number of months, and then more often than not everything breaks. Then, some part of the system is down for about 48 hours while everyone’s running around and trying to figure out what went wrong. This is because when you release only once or twice a year, changing the application is incredibly complex. Imagine building an airliner without ever testing any of the components along the way but just throwing them together and then trying to fly and test the entire thing at once.
The same approach occurs with security. When an (inevitable) critical vulnerability is discovered that requires an emergency patch, it’s an organizational nightmare to get that patch to production because it’s not part of the normal deployment schedule. For example, when I was at Etsy, we were working with a third-party vendor to fix a Cross-Site Scripting (XSS) vulnerability in the main page of its application. The vendor’s CTO got on the thread and told us, “We understand how serious this is to you. We will rush an emergency fix in six weeks." Because the vendor was still in a Waterfall methodology, this timeframe was the fastest it could ship a trivial one-line fix to its application. Although a response time like six weeks used to be considered normal in the industry, in the shift to the DevOps/cloud world, customer expectations are rapidly shifting to require the ability to respond to security issues quicker and quicker.
However, after the security team at Etsy shifted to a DevOps-friendly model, deploying every day and developing and iterating on production made fixing security issues much easier, too. When an issue occurs that your security team needs to address, you can just do it. When you’re already deploying every day, there’s no such thing as an out-of-band patch. Releasing the patch is just another deploy.
If you stop reading at this point and take only one thing from this report, I hope it is the lesson that took me a long time to learn as a Chief Information Security Officer (CISO) going through the shift to DevOps/cloud: the goal of a modern security team adapts from being a gatekeeper to focusing on making the rest of the organization security self-sufficient.
1 The Oregon Trail was an educational video game in which the player assumed the role of a wagon train leader traveling from Missouri to Oregon in 1848. The game was popular in the 1980s and 1990s.