Chapter 1. Principles and Concepts
Yes, this is a practical guide, but we do need to cover a few cloud-relevant security principles at a high level before we dive into the practical bits. If you’re a seasoned security professional new to the cloud, you may want to skim down to “The Cloud Shared Responsibility Model”.
Least Privilege
The principle of least privilege simply states that people or automated tools should be able to access only what they need to do their jobs, and no more. It’s easy to forget the automation part of this; for example, a component accessing a database should not use credentials that allow write access to the database if write access isn’t needed.
A practical application of least privilege often means that your access policies are deny by default. That is, users are granted no (or very few) privileges by default, and they need to go through the request and approval process for any privileges they require.
For cloud environments, some of your administrators will need to have access to the cloud console—a web page that allows you to create, modify, and destroy cloud assets such as virtual machines. With many providers, anyone with access to your cloud console will have godlike privileges by default for everything managed by that cloud provider. This might include the ability to read, modify, or destroy data from any part of the cloud environment, regardless of what controls are in place on the operating systems of the provisioned systems. For this reason, you need to tightly control access to and privileges on the cloud console, much as you tightly control physical data center access in on-premises environments, and record what these users are doing.
Defense in Depth
Many of the controls in this book, if implemented perfectly, would negate the need for other controls. Defense in depth is an acknowledgment that almost any security control can fail, either because an attacker is sufficiently determined or because of a problem with the way that security control is implemented. With defense in depth, you create multiple layers of overlapping security controls so that if one fails, the one behind it can still catch the attackers.
You can certainly go to silly extremes with defense in depth, which is why it’s important to understand the threats you’re likely to face, which are described later. However, as a general rule, you should be able to point to any single security control you have and say, “What if this fails?” If the answer is complete failure, you probably have insufficient defense in depth.
Threat Actors, Diagrams, and Trust Boundaries
There are different ways to think about your risks, but I typically favor an asset-oriented approach. This means that you concentrate first on what you need to protect, which is why I dig into data assets first in Chapter 2.
It’s also a good idea to keep in mind who is most likely to cause you problems. In cybersecurity parlance, these are your potential “threat actors.” For example, you may not need to guard against a well-funded state actor, but you might be in a business where a criminal can make money by stealing your data, or where a “hacktivist” might want to deface your website. Keep these people in mind when designing all of your defenses.
While there is plenty of information and discussion available on the subject of threat actors, motivations, and methods,1 in this book we’ll consider four main types of threat actors that you may need to worry about:
-
Organized crime or independent criminals, interested primarily in making money
-
Hacktivists, interested primarily in discrediting you by releasing stolen data, committing acts of vandalism, or disrupting your business
-
Inside attackers, usually interested in discrediting you or making money
-
State actors, who may be interested in stealing secrets or disrupting your business
To borrow a technique from the world of user experience design, you may want to imagine a member of each applicable group, give them a name, jot down a little about that “persona” on a card, and keep the cards visible when designing your defenses.
The second thing you have to do is figure out what needs to talk to what in your application, and the easiest way to do that is to draw a picture and figure out where your weak spots are likely to be. There are entire books on how to do this,2 but you don’t need to be an expert to draw something useful enough to help you make decisions. However, if you are in a high-risk environment, you should probably create formal diagrams with a suitable tool rather than draw stick figures.
Although there are many different application architectures, for the sample application used for illustration here, I will show a simple three-tier design. Here is what I recommend:
-
Draw a stick figure and label it “user.” Draw another stick figure and label it “administrator” (Figure 1-1). You may find later that you have multiple types of users and administrators, or other roles, but this is a good start.
-
Draw a box for the first component the user talks to (for example, the web servers), draw a line from the user to that first component, and label the line with how the user talks to that component (Figure 1-2). Note that at this point, the component may be a serverless function, a container, a virtual machine, or something else. This will let anyone talk to it, so it will probably be the first thing to go. We really don’t want the other components trusting this one more than necessary.
-
Draw other boxes behind the first for all of the other components that first system has to talk to, and draw lines going to those (Figure 1-3). Whenever you get to a system that actually stores data, draw a little symbol (I use a cylinder) next to it and jot down what data is there. Keep going until you can’t think of any more boxes to draw for your application.
-
Now draw how the administrator (and any other roles you’ve defined) accesses the application. Note that the administrator may have several different ways of talking to this application; for example, via the cloud provider’s portal or APIs, or through the operating system access, or by talking to the application similarly to how a user accesses it (Figure 1-4).
-
Draw some trust boundaries as dotted lines around the boxes (Figure 1-5). A trust boundary means that anything inside that boundary can be at least somewhat confident of the motives of anything else inside that boundary, but requires verification before trusting anything outside of the boundary. The idea is that if an attacker gets into one part of the trust boundary, it’s reasonable to assume they’ll eventually have complete control over everything in it, so getting through each trust boundary should take some effort. Note that I drew multiple web servers inside the same trust boundary; that means it’s okay for these web servers to trust each other completely, and if someone has access to one, they effectively have access to all. Or, to put it another way, if someone compromises one of these web servers, no further damage will be done by having them all compromised.
-
To some extent, we trust our entire system more than the rest of the world, so draw a dotted line around all of the boxes, including the admin, but not the user (Figure 1-6). Note that if you have multiple admins, like a web server admin and a database admin, they might be in different trust boundaries. The fact that there are trust boundaries inside of trust boundaries shows the different levels of trust. For example, the servers here may be willing to accept network connections from servers in other trust boundaries inside the application, but still verify their identities. They may not even be willing to accept connections from systems outside of the whole application trust boundary.
We’ll use this diagram of an example application throughout the book when discussing the shared responsibility model, asset inventory, controls, and monitoring. Right now, there are no cloud-specific controls shown in the diagram, but that will change as we progress through the chapters. Look at any place a line crosses a trust boundary. These are the places we need to focus on securing first!
Cloud Delivery Models
There is an unwritten law that no book on cloud computing is complete without an overview of Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). Rather than the standard overview, I’d like to point out that these service models are useful only for a general understanding of concepts; in particular, the line between IaaS and PaaS is becoming increasingly blurred. Is a content delivery network (CDN) service that caches information for you around the internet to keep it close to users a PaaS or IaaS? It doesn’t really matter. What’s important is that you understand what is (and isn’t!) provided by the service, not whether it fits neatly into any particular category.
Risk Management
Risk management is a deep subject, with entire books written about it. I recommend reading The Failure of Risk Management: Why It’s Broken and How to Fix It by Douglas W. Hubbard (Wiley), and NIST Special Publication 800-30 Rev 1 if you’re interested in getting serious about risk management. In a nutshell, humans are really bad at assessing risk and figuring out what to do about it. This section is intended to give you just the barest essentials for managing the risk of security incidents and data breaches.
At the risk of being too obvious, a risk is something bad that could happen. In most risk management systems, the level of risk is based on a combination of how probable it is that the bad thing will happen (likelihood), and how bad the results will be if it does happen (impact). For example, something that’s very likely to happen (such as someone guessing your password of “1234”) and will be very bad if it does happen (such as you losing all of your customers’ files and paying large fines) would be a high risk. Something that’s very unlikely to happen (such as an asteroid wiping out two different regional data centers at once) but that would be very bad if it does happen (going out of business) might only be a low risk, depending on the system you use for deciding the level of risk.4
In this book, I’ll talk about unknown risks (where we don’t have enough information to know what the likelihoods and impacts are) and known risks (where we at least know what we’re up against). Once you have an idea of the known risks, you can do one of four things with them:
-
Avoid the risk. In information security this typically means you turn off the system—no more risk, but also none of the benefits you had from running the system in the first place.
-
Mitigate the risk. It’s still there, but you do additional things to lower either the likelihood that the bad thing will happen or the impact if it does happen. For example, you may choose to store less sensitive data so that if there is a breach, the impact won’t be as bad.
-
Transfer the risk. You pay someone else to manage things so that the risk is their problem. This is done a lot with the cloud, where you transfer many of the risks of managing the lower levels of the system to the cloud provider.
-
Accept the risk. After looking at the overall risk level and the benefits of continuing the activity, you decide to write down that the risk exists, get all of your stakeholders to agree that it’s a risk, and then move on.
Any of these actions may be reasonable. However, what’s not acceptable is to either have no idea what your risks are, or to have an idea of what the risks are and accept them without weighing the consequences or getting buy-in from your stakeholders. At a minimum, you should have a list somewhere in a spreadsheet or document that details the risks you know about, the actions taken, and any approvals needed.
1 The Verizon Data Breach Investigations Report is an excellent free resource for understanding different types of successful attacks, organized by industry and methods, and the executive summary is very readable.
2 I recommend Threat Modeling: Designing for Security, by Adam Shostack (Wiley).
3 Original concept from an article by Albert Barron.
4 Risks can also interact, or aggregate. There may be two risks that each have relatively low likelihood and impacts, but they may be likely to occur together and the impacts can combine to be higher. For example, the impact of either power line in a redundant pair going out may be negligible, but the impact of both going out may be really bad. This is often difficult to spot; the Atlanta airport power outage in 2017 is a good example.
Get Practical Cloud Security now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.