Chapter 4. Security as Code: Security Tools and Practices in Continuous Delivery
Security as Code is about building security into DevOps tools and practices, making it an essential part of the tool chains and workflows. You do this by mapping out how changes to code and infrastructure are made and finding places to add security checks and tests and gates without introducing unnecessary costs or delays.
Security as Code uses Continuous Delivery as the control backbone and the automation engine for security and compliance. Let’s begin by briefly defining Continuous Delivery, and then walk through the steps on how to build security into Continuous Delivery.
Agile ideas and principles—working software over documentation, frequent delivery, face-to-face collaboration, and a focus on technical excellence and automation—form the foundation of DevOps. And Continuous Delivery, which is the control framework for DevOps, is also built on top of a fundamental Agile development practice: Continuous Integration.
In Continuous Integration, each time a developer checks in a code change, the system is automatically built and tested, providing fast and frequent feedback on the health of the code base. Continuous Delivery takes this to the next step.
Continuous Delivery is not just about automating the build and unit testing, which are things that the development team already owns. Continuous Delivery is provisioning and configuring test environments to match production as closely as possible—automatically. This includes packaging the code and deploying it to test environments; running acceptance, stress, and performance tests, as well as security tests and other checks, with pass/fail feedback to the team, all automatically; and auditing all of these steps and communicating status to a dashboard. Later, you use the same pipeline to deploy the changes to production.
Continuous Delivery is the backbone of DevOps and the engine that drives it. It provides an automated framework for making software and infrastructure changes, pushing out software upgrades, patches, and changes to configuration in a way that is repeatable, predictable, efficient, and fully audited.
Putting a Continuous Delivery pipeline together requires a high degree of cooperation between developers and operations, and a much greater shared understanding of how the system works, what production really looks like, and how it runs. It forces teams to begin talking to one another, exposing and exploring details about how they work and how they want to work.
There is a lot of work that needs to be done: understanding dependencies, standardizing configurations, and bringing configuration into code; cleaning up the build (getting rid of inconsistencies, hardcoding, and jury rigging); putting everything into version control—application code and configuration, binary dependencies, infrastructure configuration (recipes, manifests, playbooks, CloudFormation templates, and Dockerfiles), database schemas, and configurations for the Continuous Integration/Continuous Delivery pipeline itself; and, finally, automating testing (getting all of the steps for deployment together and automating them carefully). And you may need to do all of this in a heterogeneous environment, with different architectures and technology platforms and languages.
Continuous Delivery at London Multi-Asset Exchange
The London Multi-Asset Exchange (LMAX) is a highly regulated FX retail market in the United Kingdom, where Dave Farley (coauthor of the book Continuous Delivery) helped pioneer the model of Continuous Delivery.
LMAX’s systems were built from scratch following Agile best practices: TDD, pair programming, and Continuous Integration. But they took this further, automatically deploying code to integration, acceptance, and performance testing environments, building up a Continuous Delivery pipeline.
LMAX has made a massive investment in automated testing. Each build runs through 25,000 unit tests with code coverage failure, simple code analysis (using tools like Findbugs, PMD, and custom architectural dependency checks) and automated integration sanity checks. All of these tests and checks must pass for every piece of code submitted.
The last good build is automatically picked up and promoted to integration and acceptance testing, during which more than 10,000 end-to-end tests are run on a test cluster, including API-level acceptance tests, multiple levels of performance tests, and fault injection tests that selectively fail parts of the system and verify that the system recovers correctly without losing data. More than 24 hours’ worth of tests are run in parallel in less than 1 hour.
If all of the tests and reviews pass, the build is tagged. All builds are kept in a secure repository, together with dependent binaries (like the Java Runtime). Code and tests are tracked in version control.
QA can take a build to conduct manual exploratory testing or other kinds of tests. Operations can then pull a tagged build from the development repository to their separate secure production repository and use the same automated tools to deploy to production. Releases to production are scheduled every two weeks, on a Saturday, outside of trading hours.
This is Continuous Delivery, not Continuous Deployment as followed at Amazon or Etsy. But it still takes advantage of the same type of automation and controls, even though LMAX created a lot of the tooling on its own using scripts and simple workflow conventions, before today’s DevOps tools were available.
Injecting Security into Continuous Delivery
Before you can begin adding security checks and controls, you need to understand the workflows and tools that the engineering teams are using:
What happens before and when a change is checked in?
Where are the repositories? Who has access to them?
How do changes transition from check-in to build to Continuous Integration and unit testing, to functional and integration testing, and to staging and then finally to production?
What tests are run? Where are the results logged?
What tools are used? How do they work?
What manual checks or reviews are performed and when?
And how can you take advantage of all of this for security and compliance purposes?
Let’s map out the steps involved from taking a change from check-in to production and identify where we can insert security checks and controls. See Figure 4-1 for a model that explains how and where to add security checks into a Continuous Delivery workflow.
These are the steps before and until a change to software or configuration is checked in to the source code repo. Additional security checks and controls to be added here include the following:
Lightweight, iterative threat modeling and risk assessments
Static analysis (SAST) checking in the engineer’s IDE
Peer code reviews (for defensive coding and security vulnerabilities)
Commit Stage (Continuous Integration)
This is automatically triggered by a check in. In this stage, you build and perform basic automated testing of the system. These steps return fast feedback to developers: did this change “break the build”? This stage needs to complete in at most a few minutes. Here are the security checks that you should include in this stage:
Compile and build checks, ensuring that these steps are clean, and that there are no errors or warnings
Software Component Analysis in build, identifying risk in third-party components
Incremental static analysis scanning for bugs and security vulnerabilities
Alerting on high-risk code changes through static analysis checks or tests
Automated unit testing of security functions, with code coverage analysis
Digitally signing binary artifacts and storing them in secure repositories1
This stage is triggered by a successful commit. The latest good commit build is picked up and deployed to an acceptance test environment. Automated acceptance (functional, integration, performance, and security) tests are executed. To minimize the time required, these tests are often fanned out to different test servers and executed in parallel. Following a “fail fast” approach, the more expensive and time-consuming tests are left until as late as possible in the test cycle, so that they are only executed if other tests have already passed.
Security controls and tests in this stage include the following:
Secure, automated configuration management and provisioning of the runtime environment (using tools like Ansible, Chef, Puppet, Salt, and/or Docker). Ensure that the test environment is clean and configured to match production as closely as possible.
Automatically deploy the latest good build from the binary artifact repository.
Smoke tests (including security tests) designed to catch mistakes in configuration or deployment.
Targeted dynamic scanning (DAST).
Automated functional and integration testing of security features.
Automated security attacks, using Gauntlt or other security tools.
Deep static analysis scanning (can be done out of band).
Fuzzing (of APIs, files). This can be done out of band.
Manual pen testing (out of band).
Production Deployment and Post-Deployment
If all of the previous steps and tests pass, the change is ready to be deployed to production, pending manual review/approvals and scheduling (in Continuous Delivery) or automatically (in Continuous Deployment). Additional security checks and controls are needed in production deployment and post-deployment:
Secure, automated configuration management and provisioning of the runtime environment
Automated deployment and release orchestration (authorized, repeatable, and auditable)
Post-deployment smoke tests
Automated runtime asserts and compliance checks (monkeys)
Blameless postmortems (learning from failure)
Depending on the risk profile of your organization and systems, you will need to implement at least some of these practices and controls. Leaders in this space already do most of them.
Now, let’s look more closely at these security controls and practices and some of the tools that you can use, starting with design.
Secure Design in DevOps
Secure design in DevOps begins by building on top of secure libraries and frameworks—building security in upfront and trying to make it invisible to developers. Security risk assessments also need to be integrated into design as it changes and as part of managing the software supply chain: the open source and third-party components and frameworks that teams use to assemble important parts of any system.
Risk Assessments and Lightweight Threat Modeling
We’ve already looked at the essential problem of design in rapidly moving DevOps environments. These teams want to deliver to real users early and often so that they can refine the feature set and the design in response to production feedback. This means that the design must be lightweight at the outset, and it is constantly changing based on feedback.
In Continuous Deployment, there is no Waterfall handoff of design specifications to coders—there may not be any design specifications at all that can be reviewed as part of a risk assessment. When there is minimal design work being done, and “the code is the design,” where and how do you catch security problems in design?
You begin upfront by understanding that even if the design is only roughed out and subject to change, the team still needs to commit to a set of tools and the runtime stack to get going. This is when threat modeling—looking at the design from an attacker’s perspective, searching for gaps or weaknesses in security controls and defenses—needs to start.
At PayPal, for example, every team must go through an initial risk assessment, filling out an automated risk questionnaire whenever it begins work on a new app or microservice.2 One of the key decision points is whether the team is using existing languages and frameworks that have already been vetted.3 Or, are they introducing something new to the organization, technologies that the security team hasn’t seen before? There is a big difference in risk between “just another web or mobile app” built on an approved platform, and a technical experiment using new languages and tools.
Here are some of the issues to understand and assess in an upfront risk review:
Do you understand how to use the language(s) and frameworks safely? What security protections are offered in the framework? What needs to be added to make it simple for developers to “do the right thing” by default?
Is there good Continuous Delivery toolchain support for the language(s) and framework, including SAST checking or IAST tooling, and dependency management analysis capabilities to catch vulnerabilities in third-party and open source libraries?
Is sensitive and confidential data being used? What data is needed, how is it to be handled, and what needs to be audited? Does it need to be stored, and, if so, how? Do you need to make considerations for encryption, tokenization and masking, access control, and auditing?
Do you understand the trust boundaries between this app/service and others: where do controls need to be enforced around authentication, access control, and data quality? What assumptions are being made in the design?
These questions are especially important in microservices environments, in which teams push for flexibility to use the right tool for the job: when it is not always easy to understand call-chain dependencies—when you can’t necessarily control who calls you and what callers expect from your service, and when you can’t control what downstream services do, or when or how they will be changed. For microservices, you need to understand the following:
What assumptions are you making about callers? Where and how is authentication and authorization done? How can you be sure?
Can you trust the data that you are getting from another service? Can other services trust the data that you are providing to them?
What happens if a downstream service fails, or times out, or returns an incomplete or inconsistent result?
After the upfront assessment, threat modeling should become much simpler for most changes, because most changes will be small, incremental, and therefore low risk. You can assess risk inexpensively, informally, and iteratively by getting the team to ask itself a few questions as it is making changes:
Are you changing anything material about the tooling or stack? Are you introducing or switching to a new language, changing the backend store, or upgrading or replacing your application framework? Because design is done fast and iteratively, teams might find that their initial architectural approach does not hold up, and they might need to switch out all or part of the technology platform. This can require going back and reassessing risk from the start.
How are you affecting the attack surface of the system? Are you just adding a new field or another form? Or, are you opening up new ports or new APIs, adding new data stores, making calls out to new services?
Are you changing authentication logic or access control rules or other security plumbing?
Are you adding data elements that are sensitive or confidential? Are you changing code that has anything to do with secrets or sensitive or confidential data?
Answering these questions will tell you when you need to look more closely at the design or technology, or when you should review and verify trust assumptions. The key to threat modeling in DevOps is recognizing that because design and coding and deployment are done continuously in a tight, iterative loop, you will be caught up in the same loops when you are assessing technical risks. This means that you can make—and you need to make—threat modeling efficient, simple, pragmatic, and fast.
Securing Your Software Supply Chain
Another important part of building in security upfront is to secure your software supply chain, minimizing security risks in the software upon which your system is built. Today’s Agile and DevOps teams take extensive advantage of open source libraries to reduce development time and costs. This means that they also inherit quality problems and security vulnerabilities from other people’s code.
According to Sonatype, which runs the Central Repository, the world’s largest repository for open source software
80 percent of the code in today’s applications comes from libraries and frameworks
and a lot of this code has serious problems in it. Sonatype looked at 17 billion download requests from 106,000 different organizations in 2014. Here’s what it found:
Large software and financial services companies are using an average of 7,600 suppliers. These companies sourced an average of 240,000 software “parts” in 2014, of which 15,000 included known vulnerabilities.
More than 50,000 of the software components in the Central Repository have known security vulnerabilities. One in every 16 download requests is for software that has at least one known security vulnerability. On average, 50 new critical vulnerabilities in open source software are reported every day.
Scared yet? You should be. You need to know what open source code is included in your apps and when this changes, and review this code for known security vulnerabilities.
Luckily, you can do this automatically by using Software Component Analysis (SCA) tools like OWASP’s Dependency Check project or commercial tools like Sonatype’s Nexus Lifecycle or SourceClear. You can wire these tools into your build or Continuous Integration/Continuous Delivery pipeline to automatically inventory open source dependencies, identify out-of-date libraries and libraries with known security vulnerabilities, and fail the build automatically if serious problems are found. By building up a bill of materials for every system, you can prepare for vulnerabilities like Heartbleed or DROWN—you can quickly determine if you are exposed and what you need to fix.
These tools also can alert you when new dependencies are detected so that you can create a workflow to review them.
If you are using containers like Docker in production (or even in development and test) you should enforce similar controls over dependencies in container images. Even though Docker’s Project Nautilus scans images in official repos for packages with known vulnerabilities, you should ensure that all Docker containers are scanned, using a tool like OpenSCAP or Clair, or commercial services from Twistlock, FlawCheck, or Black Duck Hub.
Your strategic goal should be to move to “fewer, better suppliers” over time, simplifying your supply chain in order to reduce maintenance costs and security risks. Sonatype has developed a free calculator that will help developers—and managers—understand the cost and risks that you inherit over time from using too many third-party components.4.
But you need to recognize that even though it makes good sense in the long term, getting different engineering teams to standardize on using a set of common components won’t be easy, especially for microservices environments in which developers are granted the freedom to use the right tools for the job, selecting technologies based on their specific requirements, or even on their personal interests.
Begin by standardizing on the lowest layers—the kernel, OS, and VMs—and on general-purpose utility functions like logging and metrics collection, which need to be used consistently across apps and services.
Writing Secure Code in Continuous Delivery
DevOps practices emphasize the importance of writing good code: code that works and that is easy to change. You can take advantage of this in your security program, using code reviews and adding automated static analysis tools to catch common coding mistakes and security vulnerabilities early.
Using Code Reviews for Security
Peer code reviews are a common engineering practice at many Agile and DevOps shops, and they are mandatory at leading organizations like Google, Amazon, Facebook, and Etsy.
Peer code reviews are generally done to share information across the team and to help ensure that code is maintainable, to reinforce conventions and standards, and to ensure that frameworks and patterns are being followed correctly and consistently. This makes code easier to understand and safer to change, and in the process, reviewers often find bugs that need to be fixed.
You can also use code reviews to improve security in some important ways.
First, code reviews increase developer accountability and provide transparency into changes. Mandatory reviews ensure that a change can’t be pushed out without at least one other person being aware of what was done and why it was done. This significantly reduces the risk of insider threats; for example, someone trying to introduce a logic bomb or a back door in the code. Just knowing that their code will be reviewed also encourages developers to be more careful in their work, improving the quality of the code.
Transparency into code reviews can be ensured using code review tools like
Frameworks and other high-risk code including security features (authentication workflows, access control, output sanitization, crypto) and code that deals with money or sensitive data require careful, detailed review of the logic. This code must work correctly, including under error conditions and boundary cases.
Encouraging developers to look closely at error and exception handling and other defensive coding practices, including careful parameter validation, will go a long way to improving the security of most code as well as improving the runtime reliability of the system.
With just a little training, developers can learn to look out for bad practices like hardcoding credentials or attempts at creating custom crypto. With more training, they will be able to catch more vulnerabilities, early on in the process.
In some cases (for example, session management, secrets handling, or crypto), you might need to bring in a security specialist to examine the code. Developers can be encouraged to ask for security code reviews. You can also identify high-risk code through simple static code scanning, looking for specific strings such as credentials and dangerous functions like crypto functions and crypto primitives.
To identify high-risk code, Netflix maps out call sequences for microservices. Any services that are called by many other services or that fan out to many other services are automatically tagged as high risk. At Etsy, as soon as high-risk code is identified through reviews or scanning, they hash it and create a unit test that automatically alerts the security team when the code hash value has been changed.
Code review practices also need to be extended to infrastructure code—to Puppet manifests and Chef cookbooks and Ansible playbooks, Dockerfiles, and CloudFormation templates.
What About Pair Programming?
Pair programming, where developers write code together, one developer “driving” at the keyboard, and the other acting as the navigator, helping to guide the way and looking out for trouble ahead, is a great way to bring new team members up to speed, and it is proven to result in better, tighter, and more testable code. But pairing will miss important bugs, including security vulnerabilities, because pair programming is more about joint problem solving, navigating toward a solution rather than actively looking for mistakes or hunting for bugs.
Even in disciplined XP environments, you should do separate security-focused code reviews for high-risk code.
SAST: in IDE, in Continuous Integration/Continuous Delivery
Another way to improve code security is by scanning code for security vulnerabilities using automated static analysis software testing (SAST) tools. These tools can find subtle mistakes that reviewers will sometimes miss, and that might be hard to find through other kinds of testing.
But rather than relying on a centralized security scanning factory run by infosec, DevOps organizations like Twitter and Netflix implement self-service security scanning for developers, fitting SAST scanning directly into different places along the engineering workflow.
Developers can take advantage of built-in checkers in their IDE, using plug-ins like FindBugs or Find Security Bugs, or commercial plug-ins from Coverity, Klocwork, HPE Fortify, Checkmarx, or Cigital’s SecureAssist to catch security problems and common coding mistakes as they write code.
You can also wire incremental static analysis precommit and commit checks into Continuous Integration to catch common mistakes and antipatterns quickly by only scanning the code that was changed. Full system scanning might still be needed to catch interprocedural problems that some incremental scans can’t find. You will need to run these scans, which can take several hours or sometimes days to run on a large code base, outside of the pipeline. But the results can still be fed back to developers automatically, into their backlog or through email or other notification mechanisms.
Different kinds of static code scanning tools offer different value:
Tools that check for code consistency, maintainability, and clarity (PMD and Checkstyle for Java, Ruby-lint for Ruby) help developers to write code that is easier to understand, easier to change, easier to review, and safer to change.
Tools that look for common coding bugs and bug patterns (tools like FindBugs and RuboCop) will catch subtle logic mistakes and errors that could lead to runtime failures or security vulnerabilities.
Tools that identify security vulnerabilities through taint analysis, control flow and data flow analysis, pattern analysis, and other techniques (Find Security Bugs, Brakeman) can find many common security issues such as mistakes in using crypto functions, configuration errors, and potential injection vulnerabilities.
You should not rely on only one tool—even the best tools will catch only some of the problems in your code. Good practice would be to run at least one of each kind to look for different problems in the code, as part of an overall code quality and security program.
There are proven SAST tools available today for popular languages like Java, C/C++, and C#, as well as for common frameworks like Struts and Spring and .NET, and even for some newer languages and frameworks like Ruby on Rails. But it’s difficult to find tool support for other new languages such as Golang, and it’s especially difficult to find for dynamic scripting languages. Most static analyzers, especially open source tools, for these languages are still limited to linting and basic checking for bad practices, which helps to make for better code but aren’t enough to ensure that your code is secure.
Static analysis checking tools for configuration management languages (like Foodcritic for Chef or puppet-lint for Puppet) are also limited to basic checking for good coding practices and some common semantic mistakes. They help to ensure that the code works, but they won’t find serious security problems in your system configuration.
To ensure that the feedback loops are effective, it’s important to tune these tools to minimize false positives and provide developers with clear, actionable feedback on real problems that need to be fixed. Noisy checkers that generate a lot of false positives and that need review and triage can still be run periodically and the results fed back to development after they have been picked through.
Security Testing in Continuous Delivery
Security testing needs to be moved directly into Continuous Integration and Continuous Delivery in order to verify security as soon as changes are made. This could mean wiring application scanning and fuzzing into the Continuous Delivery pipeline. It could also mean taking advantage of work that the development team has already done to create an automated test suite, adding security checks into unit testing, and automating security attacks as part of integration and functional testing.
Although you need to run penetration tests and bug bounty programs outside of Continuous Delivery, they still provide valuable feedback into the automated testing program. You need to track all of the vulnerabilities found in scanning and testing—inside and outside of the pipeline, in a vulnerability manager.
Dynamic Scanning (DAST)
Black box Dynamic Analysis Security Testing (DAST) tools and services are useful for testing web and mobile apps, but they don’t always play nicely in Continuous Integration or Continuous Delivery. Most DAST tools are designed to be run by security analysts or pen testers, not a Continuous Integration engine like Jenkins or Bamboo.
You can use tools like OWASP ZAP to automatically scan a web app for common vulnerabilities as part of the Continuous Integration/Continuous Delivery pipeline. You can do this by running the scanner in headless mode through the command line, through the scanner’s API, or by using a wrapper of some kind, such as the ZAProxy Jenkins plug-in or a higher-level test framework like BDD-Security (which we’ll look at in a later section).
There is no definitive guidance (yet... the ZAP project team is working on some) on how to best integrate scanning into Continuous Delivery—you’ll need to explore this on your own. You can try to spider the app (if it is small enough), but it generally makes more sense in Continuous Integration and Continuous Delivery to target your scans in order to reduce the amount of time needed to execute the tests and minimize the amount of noise created. You can do this by proxying automated functional regression tests executed by tools like Selenium through the scanner in order to map out and navigate key forms and fields to be scanned. Then, invoke the scanner’s API and instruct the scanner to execute its fuzzing attacks.
Then, you will need to pick through the results, filter out the background noise, and determine what results constitute a pass or fail and whether you need to stop the pipeline. As with static analysis tools, you will need to tune dynamic scans to minimize false positives. You will want to set “the bug bar” high enough to ensure that you are not wasting the development team’s time.
You will also need to remove duplicate findings after each scan. Tools like Code Dx or ThreadFix (which we will also look at in a later section) can help you to do this.
Like static analysis scans, dynamic analysis checking takes time, and will probably need to be spun off to run in parallel with other tests, or even done out of band.
Fuzzing and Continuous Delivery
Another testing technique that can be valuable in finding security vulnerabilities (especially injection bugs) is fuzzing. Fuzzing is a brute-force reliability testing technique wherein you create and inject random data into a file or API in order to intentionally cause errors and then see what happens. Fuzz testing is important in embedded systems development for which the costs of mistakes are high, and it has been a fundamental part of application security testing programs at Microsoft, Facebook, Adobe, and Google.5 Fuzz testing tools are also commonly used by security researchers to hunt for bugs.
However, fuzzing, like scanning, doesn’t fit nicely into Continuous Integration and Continuous Delivery automation for a number of reasons:
Fuzz tests are generally not predictable or repeatable. The nature of fuzzing is to try random things and see what happens.
The results of fuzz testing are also not predictable. The system might crash or some kind of exception might occur that can leave the system in an undefined state for future testing.
The results of fuzzing can be difficult to assess and understand and might require a manual reviewer to identify and qualify problems and recognize when multiple problems have the same cause.
Good fuzzing takes time—hours or days—to complete and lots of CPU cycles, which makes it difficult to fit into the time box of Continuous Delivery.
Some newer fuzzing tools are designed to run (or can be adapted to run) in Continuous Integration and Continuous Delivery. They let you to seed test values to create repeatable tests, set time boxes on test runs, detect duplicate errors, and write scripts to automatically set up/restore state in case the system crashes. But you might still find that fuzz testing is best done out of band.
Security in Unit and Integration Testing
Continuous Integration and Continuous Delivery and especially practices like Behavior-Driven Development (BDD) and Test-Driven Development (TDD)—wherein developers write tests before they write the code—encourage developers to create a strong set of automated tests to catch mistakes and protect themselves from regressions as they add new features or make changes or fixes to the code.
Most of these tests will be positive, happy-path tests which prove that features work as expected. This is the way that most developers think and it is what they are paid to do. But this means that they often miss testing for edge cases, exceptions, and negative cases that could catch security problems.
And most of these automated tests, following automated “testing pyramid” conventions, will be low-level unit tests, written by developers to validate detailed logic within a method or function. Unit tests are important in catching regressions as you refactor code or make other changes, but they won’t find mistakes or misunderstandings made in calling functions or services, like not calling a function at all, which are a common cause of security vulnerabilities and can only be caught in higher-level functional or integration tests.
For security code and framework code and for other high-risk functions, you should convince developers to step off the happy path and write good unit and functional and integration tests around—and especially outside—boundary conditions. They need to test error handling and exception handling logic, and write negative tests: sanity tests that should never pass unless something has gone wrong. Insist on high levels of automated test coverage for high-risk code.
Spend some time with the team to come up with abuse(r) or “evil user” stories, “misuse cases” that explore how a bad user could try to take advantage of a feature, or what could happen if they stray off of the main success scenarios. This doesn’t necessarily require specialized security knowledge; you can accomplish a lot by just asking
But, what happens if the user doesn’t...?
Then, write negative tests for these cases, tests which prove that unauthenticated users can’t access admin functions, that a user can’t see or change information for a different account, that they can’t tamper with a return value, and so on. If these tests fail, something is seriously broken and you want to learn about it as early as possible.
Even with these tests in place, you should still go further and try to attack your system. Bad guys are going to attack your system, if they haven’t done so already. You should try to beat them to it.
There are a few test frameworks that are designed to make this easy and that behave well in Continuous Integration and Continuous Delivery:
Using one of these tools, you will be able to set up and run a basic set of targeted automated pen tests against your system as part of your automated test cycle.
Just as with automating integration testing or acceptance testing, it will take a while to build up a strong set of security tests in Continuous Delivery. Begin by building a post-deployment security smoke test, a basic regression test that will run in acceptance testing and in production to catch common and important security problems, and to ensure that security configurations are correct.
Pen Testing and Bug Bounties
Manual penetration testing is not effective as a control gate in Continuous Delivery or Continuous Deployment. The velocity of delivery is too fast, and pen tests take too long to set up, run, and review.
But there is still important value in pen testing out-of-band from the Continuous Delivery pipeline, not only to satisfy mandatory compliance requirements. More important, you can use the results of pen testing to validate your security program, highlighting strengths and weaknesses.
Good pen testing is exploratory and creative—unlike most of the automated testing in Continuous Delivery, which is intended to catch the same kinds of mistakes in design and coding and configuration, over and over. A good pen tester will help you to find problems that you wouldn’t otherwise have known to look for or known how to find.
The real value in these tests is not in the bugs that they find; it’s in the information that the bugs provide you, if you look deep enough. Where did the bug come from? Why did you miss finding it yourself? How did it get there in the first place? What do we need to change or improve to prevent problems like this from happening again?
The same principle applies to Bug Bounties, which are part of the security programs at leading organizations like Google, Etsy, Netflix, and Facebook. Enlisting the community of security researchers to find security and reliability bugs in your software gives you access to creativity and skills that you couldn’t afford otherwise. They will find some important bugs. Fixing these bugs will make your system safer.
But, more importantly, they will provide you with information on where you need to improve your design, coding, reviews, testing, and configuration management. This is information that you can use to get better as an organization and to build better and safer systems.
Infosec needs their own view into the pipeline and into the system, and across all of the pipelines and systems and portfolios, to track vulnerabilities, assess risk, and understand trends. You need metrics for compliance and risk-management purposes, to understand where you need to prioritize your testing and training efforts and to assess your application security program.
Collecting data on vulnerabilities lets you ask some important questions:
How many vulnerabilities have you found?
How were they found? What tools or testing approaches are giving you the best returns?
What are the most serious vulnerabilities?
How long are they taking to get fixed? Is this getting better or worse over time?
You can get this information by feeding security testing results from your Continuous Delivery pipelines into a vulnerability manager, such as Code Dx or ThreadFix.
Securing the Infrastructure
In Continuous Delivery, the same practices, automated workflows, and controls that are used to build and deliver secure code are used to secure the infrastructure:
Managing configuration as code (checking code into version control, ensuring that it is reviewed, scanning it for common mistakes)
Building hardening policies into configuration code by default
Using the Continuous Delivery pipeline to automatically test, deploy, and track configuration changes
Securing the Continuous Delivery pipeline itself
Let’s look at these ideas in some more detail.
Automated Configuration Management
Code-driven configuration management tools like Puppet, Chef, and Ansible make it easy to set up standardized configurations across hundreds of servers using common templates, minimizing the risk that hackers can exploit one unpatched server, and letting you minimize any differences between production, test, and development environments. All of the configuration information for the managed environments is visible in a central repository and under version control. This means that when a vulnerability is reported in a software component like OpenSSL, it is easy to identify which systems need to be patched. And it is easy to push patches out, too.
These tools also provide some host-based intrusion-detection capabilities and give you control over configuration drift: they continuously and automatically audit runtime configurations to ensure that every system matches the master configuration definition, issue alerts when something is missing or wrong, and can automatically correct it.
Security should be baked in to Amazon Machine Images (AMIs) and other configuration templates. Puppet manifests, Chef cookbooks, Ansible playbooks, and Dockerfiles should be written and reviewed with security in mind. Unit tests for configuration code should include security checks such as the following:
Ensure that unnecessary services are disabled
Ensure that ports that do not need to be open are indeed not open
Look for hardcoded credentials and secrets
Review permissions on sensitive files and directories
Ensure that security tools like OSSEC or AIDE are installed and set up correctly
Ensure that development tools are not installed in production servers
Check auditing and logging policies and configurations
Build standard hardening steps into your recipes instead of using scripts or manual checklists. This includes minimizing the attack surface by removing all packages that aren’t needed and that have known problems; and changing default configurations to be safe.
Security standards like the Center for Internet Security (CIS) benchmarks and NIST configuration checklists can be burned into Puppet and Chef and Ansible specifications. There are several examples of Puppet modules and Chef cookbooks available to help harden Linux systems against CIS benchmarks and the Defense Information Systems Agency Security Technical Implementation Guides.
Securing Your Continuous Delivery Pipeline
It’s important not only to secure the application and its runtime environment, but to secure the Continuous Delivery tool chain and build and test environments, too. You need to have confidence in the integrity of delivery and the chain of custody, not just for compliance and security reasons, but also to ensure that changes are made safely, repeatably, and traceably.
Your Continuous Delivery tool chain is also a dangerous attack target itself: it provides a clear path for making changes and pushing them automatically into production. If it is compromised, attackers have an easy way into your development, test, and production environments. They could steal data or intellectual property, inject malware anywhere into the environment, DoS your systems, or cripple your organization’s ability to respond to an attack by shutting down the pipeline itself.
Continuous Delivery and Continuous Deployment effectively extend the attack surface of your production system to your build and automated test and deployment environment.
You also need to protect the pipeline from insider attacks by ensuring that all changes are fully transparent and traceable from end to end, that a malicious and informed insider cannot make a change without being detected, and that they cannot bypass any checks or validations.
Do a threat model on the Continuous Delivery pipeline. Look for weaknesses in the setup and controls, and gaps in auditing or logging. Then, take steps to secure your configuration management environment and Continuous Delivery pipeline:
Harden the systems that host the source and build artifact repositories, the Continuous Integration and Continuous Delivery server(s), and the systems that host the configuration management, build, deployment, and release tools. Ensure that you clearly understand—and control—what is done on-premises and what is in the cloud.
Harden the Continuous Integration and/or Continuous Delivery server. Tools like Jenkins are designed for developer convenience and are not secure by default. Ensure that these tools (and the required plug-ins) are kept up-to-date and tested frequently.
Lock down and harden your configuration management tools. See “How to be a Secure Chef,” for example.
Ensure that keys, credentials, and other secrets are protected. Get secrets out of scripts and source code and plain-text files and use an audited, secure secrets manager like Chef Vault, Square’s KeyWhiz project, or HashiCorp Vault.
Secure access to the source and binary repos and audit access to them.
Implement access control across the entire tool chain. Do not allow anonymous or shared access to the repos, to the Continuous Integration server, or confirmation manager or any other tools.
Change the build steps to sign binaries and other build artifacts to prevent tampering.
Periodically review the logs to ensure that they are complete and that you can trace a change through from start to finish. Ensure that the logs are immutable, that they cannot be erased or forged.
Ensure that all of these systems are monitored as part of the production environment.
Security in Production
Security doesn’t end after systems are in production. In DevOps, automated security checks, continuous testing, and monitoring feedback loops are integral parts of production operations.
Runtime Checks and Monkeys
If you are going to allow developers to do self-service, push-button deploys to production and you can’t enforce detailed reviews of each change, you will need to add some runtime checking to catch oversights or shortcuts. This is what Jason Chan at Netflix calls moving “from gates to guardrails”.
After each deploy, check that engineers used templates properly and that they didn’t make a fundamental mistake in configuration or deployment that could open up the system to attack or make it less reliable under failure.
This is why Netflix created the Simian Army, a set of automated runtime checks and tests, including the famous Chaos Monkey.
Chaos Monkey, Chaos Gorilla, and Chaos Kong check that the system is set up and designed correctly to handle failures by randomly injecting failures into the production runtime, as part of Netflix’s approach to Chaos Engineering.
The other monkeys are rule-driven compliance services that automatically monitor the runtime environment to detect changes and to ensure that configurations match predefined definitions. They look for violations of security policies and common security configuration weaknesses (in the case of Security Monkey) or configurations that do not meet predefined standards (Conformity Monkey). They run periodically online, notifying the owner(s) of the service and infosec when something looks wrong. The people responsible for the service need to investigate and correct the problem, or justify the situation.
Security Monkey captures details about changes to policies over time. It also can be used as an analysis and reporting tool and for forensics purposes, letting you search for changes across time periods and across accounts, regions, services, and configuration items. It highlights risks like changes to access control policies or firewall rules.
Similar tools include Amazon’s AWS Inspector, which is a service that provides automated security assessments of applications deployed on AWS, scans for vulnerabilities, and checks for deviations from best practices, including rules for PCI DSS and other compliance standards. It provides a prioritized list of security issues along with recommendations on how to fix them.
Although checks like this are particularly important in a public cloud environment like Netflix operates in, where changes are constantly being made by developers, the same ideas can be extended to any system. Always assume that mistakes can and will be made, and check to ensure that the system setup is correct any time a change is made. You can write your own runtime asserts:
Check that firewall rules are set up correctly
Verify files and directory permissions
Check sudo rules
Confirm SSL configurations
Ensure that logging and monitoring services are working correctly
Run your security smoke test every time the system is deployed, in test and in production.
Tools like Puppet and Chef will automatically and continuously scan infrastructure to detect variances from the expected baseline state and alert or automatically revert them.
Situational Awareness and Attack-Driven Defense
DevOps values production feedback and emphasizes the importance of measuring and monitoring production activity. You can extend the same approaches—and the same tools —to security monitoring, involving the entire team instead of just the SOC, making security metrics available in the context of the running system, and graphing and visualizing security-related data to identify trends and anomalies.
Recognize that your system is, or will be, under constant attack. Take advantage of the information that this gives you. Use this information to identify and understand attacks and the threat profile of the system.
Attacks take time. Move to the left of the kill chain and catch them in the early stages. You will reduce the Mean Time to Detect (MTTD) attacks by taking advantage of the close attention that DevOps teams pay to feedback from production, and adding security data into these feedback loops. You also will benefit by engaging people who are closer to the system: the people who wrote the code and keep the system running, who understand how it is supposed to work, what normal looks like, and when things aren’t normal.
Feed this data back into your testing and your reviews, prioritizing your actions based on what you are seeing in production, in the same way that you would treat feedback from Continuous Integration or A/B testing in production. This is real feedback, not theoretical, so it should be acted on immediately and seriously.
This is what Zane Lackey at Signal Sciences calls “Attack-Driven Defense”. Information on security events helps you to understand and prioritize threats based on what’s happening now in production. Watching for runtime errors and exceptions and attack signatures shows where you are being probed and tested, what kind of attacks you are seeing, where they are attacking, where they are being successful, and what parts of the code need to be protected.
This should help drive your security priorities, tell you where you should focus your testing and remediation. Vulnerabilities that are never attacked (probably) won’t hurt you. But attacks that are happening right now need to be resolved—right now.
If you can’t successfully shift security left, earlier into design and coding and Continuous Integration and Continuous Delivery, you’ll need to add more protection at the end, after the system is in production. Network IDS/IPS solutions tools like Tripwire or signature-based WAFs aren’t designed to keep up with rapid system and technology changes in DevOps. This is especially true for cloud IaaS and PaaS environments, for which there is no clear network perimeter and you might be managing hundreds or thousands of ephemeral instances across different environments (public, private, and hybrid), with self-service Continuous Deployment.
A number of cloud security protection solutions are available, offering attack analysis, centralized account management and policy enforcement, file integrity monitoring and intrusion detection, vulnerability scanning, micro-segmentation, and integration with configuration management tools like Chef and Puppet. Some of these solutions include the following:
Another kind of runtime defense technology is Runtime Application Security Protection/Self-Protection (RASP), which uses run-time instrumentation to catch security problems as they occur. Like application firewalls, RASP can automatically identify and block attacks. And like application firewalls, you can extend RASP to legacy apps for which you don’t have source code.
But unlike firewalls, RASP is not a perimeter-based defense. RASP instruments the application runtime code and can identify and block attacks at the point of execution. Instead of creating an abstract model of the code (like static analysis tools), RASP tools have visibility into the code and runtime context, and use taint analysis and data flow and control flow and lexical analysis techniques, directly examining data variables and statements to detect attacks. This means that RASP tools have a much lower false positive (and false negative) rate than firewalls.
You also can use RASP tools to inject logging and auditing into legacy code to provide insight into the running application and attacks against it. They trade off runtime overheads and runtime costs against the costs of making coding changes and fixes upfront.
There are only a small number of RASP solutions available today, mostly limited to applications that run in the Java JVM and .NET CLR, although support for other languages like Node.js, Python, and Ruby is emerging. These tools include the following:
Other runtime defense solutions take a different approach from RASP or firewalls. Here are a couple of innovative startups in this space that are worth checking out:
- tCell is a startup that offers application runtime immunity. tCell is a cloud-based SaaS solution that instruments the system at runtime and injects checks and sensors into control points in the running application: database interfaces, authentication controllers, and so on.
- It uses this information to map out the attack surface of the system and identifies when the attack surface is changed. tCell also identifies and can block runtime attacks based on the following:
Known bad patterns of behavior (for example, SQL injection attempts)—like a WAF.
Threat intelligence and correlation—black-listed IPs, and so on.
Behavioral learning—recognizing anomalies in behavior and traffic. Over time, it identifies what is normal and can enforce normal patterns of activity, by blocking or alerting on exceptions.
- tCell works in Java, Node.js, Ruby on Rails, and Python (.NET and PHP are in development).
- Twistlock provides runtime defense capabilities for Docker containers in enterprise environments. Twistlock’s protection includes enterprise authentication and authorization capabilities—the Twistlock team is working with the Docker community to help implement frameworks for authorization (their authorization plug-in framework was released as part of Docker 1.10) and authentication, and Twistlock provides plug-ins with fine-grained access control rules and integration with LDAP/AD.
- Twistlock scans containers for known vulnerabilities in dependencies and configuration (including scanning against the Docker CIS benchmark). It also scans to understand the purpose of each container. It identifies the stack and the behavioral profile of the container and how it is supposed to act, creating a white list of expected and allowed behaviors.
- An agent installed in the runtime environment (also as a container) runs on each node, talking to all of the containers on the node and to the OS. This agent provides visibility into runtime activity of all the containers, enforces authentication and authorization rules, and applies the white list of expected behaviors for each container as well as a black list of known bad behaviors (like a malware solution).
- And because containers are intended to be immutable, Twistlock recognizes and can block attempts to change container configurations at runtime.
Learning from Failure: Game Days, Red Teaming, and Blameless Postmortems
Game Days—running real-life, large-scale failure tests (like shutting down a data center)—have also become common practices in DevOps organizations like Amazon, Google, and Etsy. These exercises can involve (at Google, for example) hundreds of engineers working around the clock for several days, to test out disaster recovery cases and to assess how stress and exhaustion could impact the organization’s ability to deal with real accidents.7
At Etsy, Game Days are run in production, even involving core functions such as payments handling. Of course, this begs the question, “Why not simulate this in a QA or staging environment?” Etsy’s response is, first, the existence of any differences in those environments brings uncertainty to the exercise; second, the risk of not recovering has no consequences during testing, which can bring hidden assumptions into the fault tolerance design and into recovery. The goal is to reduce uncertainty, not increase it.8
These exercises are carefully tested and planned in advance. The team brainstorms failure scenarios and prepares for them, running through failures first in test and fixing any problems that come up. Then, it’s time to execute scenarios in production, with developers and operators watching closely and ready to jump in and recover, especially if something goes unexpectedly wrong.
You can take many of the ideas from Game Days, which are intended to test the resilience of the system and the readiness of the DevOps team to handle system failures, and apply them to infosec attack scenarios through Red Teaming. This is a core practice at organizations like Microsoft, Facebook, Salesforce, Yahoo!, and Intuit for their cloud-based services.
Like operations Game Days, Red Team exercises are most effectively done in production.
The Red Team identifies weaknesses in the system that they believe can be exploited, and work as ethical hackers to attack the live system. They are generally given freedom to act short of taking the system down or damaging or exfiltrating sensitive data. The Red Team’s success is measured by the seriousness of the problems that they find, and their Mean Time to Exploit/Compromise.
The Blue Team is made up of the people who are running, supporting, and monitoring the system. Their responsibility is to identify when an attack is in progress, understand the attack, and come up with ways to contain it. Their success is measured by the Mean Time to Detect the attack and their ability to work together to come up with a meaningful response.
Here are the goals of these exercises:
Identify gaps in testing and in design and implementation by hacking your own systems to find real, exploitable vulnerabilities.
Exercise your incident response and investigation capabilities, identify gaps or weaknesses in monitoring and logging, in playbooks, and escalation procedures and training.
Build connections between the security team and development and operations by focusing on the shared goal of making the system more secure.
After a Game Day or Red Team exercise, just like after a real production outage or a security breach, the team needs to get together to understand what happened and learn how to get better. They do this in Blameless Postmortem reviews. Here, everyone meets in an open environment to go over the facts of the event: what happened, when it happened, how people reacted, and then what happened next. By focusing calmly and objectively on understanding the facts and on the problems that came up, the team can learn more about the system and about themselves and how they work, and they can begin to understand what went wrong, ask why things went wrong, and look for ways to improve, either in the way that the system is designed, or how it is tested, or in how it is deployed, or how it is run.
To be successful, you need to create an environment in which people feel safe to share information, be honest and truthful and transparent, and to think critically without being criticized or blamed—what Etsy calls a “Just Culture.” This requires buy-in from management down, understanding and accepting that accidents can and will happen, and that they offer an important learning opportunity. When done properly, Blameless Postmortems not only help you to learn from failures and understand and resolve important problems, but they can also bring people together and reinforce openness and trust, making the organization stronger.9
Security at Netflix
Netflix is another of the DevOps unicorns. Like Etsy, Amazon, and Facebook, it has built its success through a culture based on “Freedom and Responsibility” (employees, including engineers, are free to do what they think is the right thing, but they are also responsible for the outcome) and a massive commitment to automation, including in security—especially in security.
After experiencing serious problems running its own IT infrastructure, Netflix made the decision to move its online business to the cloud. It continues to be one of the largest users of Amazon’s AWS platform.
Netflix’s approach to IT operations is sometimes called “NoOps” because they don’t have operations engineers or system admins. They have effectively outsourced that part of their operations to Amazon AWS because they believe that data center management and infrastructure operations is “undifferentiated heavy lifting.” Or, put another way, work that is hard to do right but that does not add direct value to their business.
Here are the four main pillars of Netflix’s security program:10
- Undifferentiated heavy lifting and shared responsibility
- Netflix relies heavily on the capabilities of AWS and builds on or extends these capabilities as necessary to provide additional security and reliability features. It relies on its cloud provider for automated provisioning, platform vulnerability management, data storage and backups, and physical data center protections. Netflix built its own PaaS layer on top of this, including an extensive set of security checks and analytic and monitoring services. Netflix also bakes secure defaults into its base infrastructure images, which are used to configure each instance.
- Traceability in development
- Source control, code reviews through Git pull requests, and the Continuous Integration and Continuous Delivery pipeline provide a complete trace of all changes from check-in to deployment. Netflix uses the same tools to track information for its own support purposes as well as for auditors instead of wasting time creating audit trails just for compliance purposes. Engineers and auditors both need to know who made what changes when, how the changes were tested, when they were deployed, and what happened next. This provides visibility and traceability for support and continuous validation of compliance.
- Continuous security visibility
- Recognize that the environment is continuously changing and use automated tools to identify and understand security risks and to watch for and catch problems. Netflix has written a set of its own tools to do this, including Security Monkey, Conformity Monkey, and Penguin Shortbread (which automatically identifies microservices and continuously assesses the risk of each service based on runtime dependencies).
- Take advantage of cloud account segregation, data tokenization, and microservices to minimize the system’s attack surface and contain attacks, and implement least privilege access policies. Recognizing that engineers will generally ask for more privileges than they need “just in case,” Netflix has created an automated tool called Repoman, which uses AWS Cloudtrail activity history and reduces account privileges to what is actually needed based on what each account has done over a period of time. Compartmentalization and building up bulkheads also contains the “blast radius” of a failure, reducing the impact on operations when something goes wrong.
Whether you are working in the cloud or following DevOps in your own data center, these principles are all critical to building and operating a secure and reliable system.
1 For software that is distributed externally, this should involve signing the code with a code-signing certificate from a third-party CA. For internal code, a hash should be enough to ensure code integrity.
2 “Agile Security – Field of Dreams.” Laksh Raghavan, PayPal, RSA Conference 2016. https://www.rsaconference.com/events/us16/agenda/sessions/2444/agile-security-field-of-dreams
3 At Netflix, where they follow a similar risk-assessment process, this is called “the paved road,” because the path ahead should be smooth, safe, and predictable.
4 Shannon Lientz, http://www.devsecops.org/blog/2016/1/16/fewer-better-suppliers
5 “Fuzzing at Scale.” Google Security Blog. https://security.googleblog.com/2011/08/fuzzing-at-scale.html
9 “Blameless PostMortems and a Just Culture.” https://codeascraft.com/2012/05/22/blameless-postmortems/
10 See “Splitting the Check on Compliance and Security: Keeping Developers and Auditors Happy in the Cloud.” Jason Chan, Netflix, AWS re:Invent, October 2015. https://www.youtube.com/watch?v=Io00_K4v12Y