Chapter 1. Principles and Concepts

Yes, this is a practical guide, but we do need to cover a few cloud-relevant security principles at a high level before we dive into the practical bits. If you’re a seasoned security professional new to the cloud, you may want to skim down to “The Cloud Shared Responsibility Model”.

Least Privilege

The principle of least privilege simply states that people or automated tools should be able to access only what they need to do their jobs, and no more. It’s easy to forget the automation part of this; for example, a component accessing a database should not use credentials that allow write access to the database if write access isn’t needed.

A practical application of least privilege often means that your access policies are deny by default. That is, users are granted no (or very few) privileges by default, and they need to go through the request and approval process for any privileges they require.

For cloud environments, some of your administrators will need to have access to the cloud console—a web page that allows you to create, modify, and destroy cloud assets such as virtual machines. With many providers, anyone with access to your cloud console will have godlike privileges by default for everything managed by that cloud provider. This might include the ability to read, modify, or destroy data from any part of the cloud environment, regardless of what controls are in place on the operating systems of the provisioned systems. For this reason, you need to tightly control access to and privileges on the cloud console, much as you tightly control physical data center access in on-premises environments, and record what these users are doing.

Defense in Depth

Many of the controls in this book, if implemented perfectly, would negate the need for other controls. Defense in depth is an acknowledgment that almost any security control can fail, either because an attacker is sufficiently determined or because of a problem with the way that security control is implemented. With defense in depth, you create multiple layers of overlapping security controls so that if one fails, the one behind it can still catch the attackers.

You can certainly go to silly extremes with defense in depth, which is why it’s important to understand the threats you’re likely to face, which are described later. However, as a general rule, you should be able to point to any single security control you have and say, “What if this fails?” If the answer is complete failure, you probably have insufficient defense in depth.

Threat Actors, Diagrams, and Trust Boundaries

There are different ways to think about your risks, but I typically favor an asset-oriented approach. This means that you concentrate first on what you need to protect, which is why I dig into data assets first in Chapter 2.

It’s also a good idea to keep in mind who is most likely to cause you problems. In cybersecurity parlance, these are your potential “threat actors.” For example, you may not need to guard against a well-funded state actor, but you might be in a business where a criminal can make money by stealing your data, or where a “hacktivist” might want to deface your website. Keep these people in mind when designing all of your defenses.

While there is plenty of information and discussion available on the subject of threat actors, motivations, and methods,1 in this book we’ll consider four main types of threat actors that you may need to worry about:

  • Organized crime or independent criminals, interested primarily in making money

  • Hacktivists, interested primarily in discrediting you by releasing stolen data, committing acts of vandalism, or disrupting your business

  • Inside attackers, usually interested in discrediting you or making money

  • State actors, who may be interested in stealing secrets or disrupting your business

To borrow a technique from the world of user experience design, you may want to imagine a member of each applicable group, give them a name, jot down a little about that “persona” on a card, and keep the cards visible when designing your defenses.

The second thing you have to do is figure out what needs to talk to what in your application, and the easiest way to do that is to draw a picture and figure out where your weak spots are likely to be. There are entire books on how to do this,2 but you don’t need to be an expert to draw something useful enough to help you make decisions. However, if you are in a high-risk environment, you should probably create formal diagrams with a suitable tool rather than draw stick figures.

Although there are many different application architectures, for the sample application used for illustration here, I will show a simple three-tier design. Here is what I recommend:

  1. Draw a stick figure and label it “user.” Draw another stick figure and label it “administrator” (Figure 1-1). You may find later that you have multiple types of users and administrators, or other roles, but this is a good start.

    prcs 0101
    Figure 1-1. User and administrator roles
  2. Draw a box for the first component the user talks to (for example, the web servers), draw a line from the user to that first component, and label the line with how the user talks to that component (Figure 1-2). Note that at this point, the component may be a serverless function, a container, a virtual machine, or something else. This will let anyone talk to it, so it will probably be the first thing to go. We really don’t want the other components trusting this one more than necessary.

    prcs 0102
    Figure 1-2. First component
  3. Draw other boxes behind the first for all of the other components that first system has to talk to, and draw lines going to those (Figure 1-3). Whenever you get to a system that actually stores data, draw a little symbol (I use a cylinder) next to it and jot down what data is there. Keep going until you can’t think of any more boxes to draw for your application.

    prcs 0103
    Figure 1-3. Additional components
  4. Now draw how the administrator (and any other roles you’ve defined) accesses the application. Note that the administrator may have several different ways of talking to this application; for example, via the cloud provider’s portal or APIs, or through the operating system access, or by talking to the application similarly to how a user accesses it (Figure 1-4).

    prcs 0104
    Figure 1-4. Administrator access
  5. Draw some trust boundaries as dotted lines around the boxes (Figure 1-5). A trust boundary means that anything inside that boundary can be at least somewhat confident of the motives of anything else inside that boundary, but requires verification before trusting anything outside of the boundary. The idea is that if an attacker gets into one part of the trust boundary, it’s reasonable to assume they’ll eventually have complete control over everything in it, so getting through each trust boundary should take some effort. Note that I drew multiple web servers inside the same trust boundary; that means it’s okay for these web servers to trust each other completely, and if someone has access to one, they effectively have access to all. Or, to put it another way, if someone compromises one of these web servers, no further damage will be done by having them all compromised.

    prcs 0105
    Figure 1-5. Component trust boundaries
  6. To some extent, we trust our entire system more than the rest of the world, so draw a dotted line around all of the boxes, including the admin, but not the user (Figure 1-6). Note that if you have multiple admins, like a web server admin and a database admin, they might be in different trust boundaries. The fact that there are trust boundaries inside of trust boundaries shows the different levels of trust. For example, the servers here may be willing to accept network connections from servers in other trust boundaries inside the application, but still verify their identities. They may not even be willing to accept connections from systems outside of the whole application trust boundary.

    prcs 0106
    Figure 1-6. Whole application trust boundary

We’ll use this diagram of an example application throughout the book when discussing the shared responsibility model, asset inventory, controls, and monitoring. Right now, there are no cloud-specific controls shown in the diagram, but that will change as we progress through the chapters. Look at any place a line crosses a trust boundary. These are the places we need to focus on securing first!

Cloud Delivery Models

There is an unwritten law that no book on cloud computing is complete without an overview of Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). Rather than the standard overview, I’d like to point out that these service models are useful only for a general understanding of concepts; in particular, the line between IaaS and PaaS is becoming increasingly blurred. Is a content delivery network (CDN) service that caches information for you around the internet to keep it close to users a PaaS or IaaS? It doesn’t really matter. What’s important is that you understand what is (and isn’t!) provided by the service, not whether it fits neatly into any particular category.

The Cloud Shared Responsibility Model

The most basic security question you must answer is, “What aspects of security am I responsible for?” This is often answered implicitly in an on-premises environment. The development organization is responsible for code errors, and the operations organization (IT) is responsible for everything else. Many organizations now run a DevOps model where those responsibilities are shared, and team boundaries between development and operations are blurred or nonexistent. Regardless of how it’s organized, almost all security responsibility is inside the company.

Perhaps one of the most jarring changes when moving from an on-premises environment to a cloud environment is a more complicated shared responsibility model for security. In an on-premises environment, you may have had some sort of internal document of understanding or contract with IT or some other department that ran servers for you. However, in many cases business users of IT were used to handing the requirements or code to an internal provider and having everything else done for them, particularly in the realm of security.

Even if you’ve been operating in a cloud environment for a while, you may not have stopped to think about where the cloud provider’s responsibility ends and where yours begins. This line of demarcation is different depending on the types of cloud service you’re purchasing. Almost all cloud providers address this in some way in their documentation and education, but the best way to explain it is to use the analogy of eating pizza.

With Pizza-as-a-Service,3 you’re hungry for pizza. There are a lot of choices! You could just make a pizza at home, although you’d need to have quite a few ingredients and it would take a while. You could run up to the grocery store and grab a take-and-bake; that only requires you to have an oven and a place to eat it. You could call your favorite pizza delivery place. Or, you could just go sit down at a restaurant and order a pizza. If we draw a diagram of the various components and who’s responsible for them, we get something like Figure 1-7.

The traditional on-premises world is like making a pizza at home. You have to buy a lot of different components and put them together yourself, but you get complete flexibility. Anchovies and cinnamon on wheat crust? If you can stomach it, you can make it.

When you use Infrastructure as a Service, though, the base layer is already done for you. You can bake it to taste and add a salad and drinks, and you’re responsible for those things. When you move up to Platform as a Service, even more decisions are already made for you, and you just use that service as part of developing your overall solution. (As mentioned in the previous section, sometimes it can be difficult to categorize a service as IaaS or PaaS, and they’re growing together in many cases. The exact classification isn’t important; what’s important is that you understand what the service provides and what your responsibilities are.)

When you get to Software as a Service (compared to dining out in Figure 1-7), it seems like everything is done for you. It’s not, though. You still have a responsibility to eat safely, and the restaurant is not responsible if you choke on your food. In the SaaS world, this largely comes down to managing access control properly.

prcs 0107
Figure 1-7. Pizza as a Service

If we draw the diagram with technology instead of pizza, it looks more like Figure 1-8.

prcs 0108
Figure 1-8. Cloud shared responsibility model

The reality of cloud computing is unfortunately a little more complicated than eating pizza, so there are some gray areas. At the bottom of the diagram, things are concrete (often literally). The cloud provider has complete responsibility for physical infrastructure security—which often involves controls beyond what many companies can reasonably do on-premises, such as biometric access with anti-tailgating measures, security guards, slab-to-slab barriers, and similar controls to keep unauthorized personnel out of the physical facilities.

Likewise, if the provider offers virtualized environments, the virtualized infrastructure security controls keeping your virtual environment separate from other virtual environments are the provider’s responsibility. When the Spectre and Meltdown vulnerabilities came to light in early 2018, one of the potential effects was that users in one virtual machine could read the memory of another virtual machine on the same physical computer. For IaaS customers, fixing that part of the vulnerability was the responsibility of the cloud provider, but fixing the vulnerabilities within the operating system was the customer’s responsibility.

Network security is shown as a shared responsibility in the IaaS section of Figure 1-8. Why? It’s hard to show on a diagram, but there are several layers of networking, and the responsibility for each lies with a different party. The cloud provider has its own network that is its responsibility, but there is usually a virtual network on top (for example, some cloud providers offer a virtual private cloud), and it’s the customer’s responsibility to carve this into reasonable security zones and put in the proper rules for access between them. Many implementations also use overlay networks, firewalls, and transport encryption that are the customer’s responsibility. This will be discussed in depth in Chapter 6.

Operating system security is usually straightforward: it’s your responsibility if you’re using IaaS, and it’s the provider’s responsibility if you’re purchasing platform or software services. In general, if you’re purchasing those services, you have no access to the underlying operating system. (As as general rule of thumb, if you have the ability to break it, you usually have the responsibility for securing it!)

Middleware, in this context, is a generic name for software such as databases, application servers, or queuing systems. They’re in the middle between the operating system and the application—not used directly by end users, but used to develop solutions for end users. If you’re using a PaaS, middleware security is often a shared responsibility; the provider might keep the software up to date (or make updates easily available to you), but you retain the responsibility for security-relevant settings such as encryption.

The application layer is what the end user actually uses. If you’re using SaaS, vulnerabilities at this layer (such as cross-site scripting or SQL injection) are the provider’s responsibility, but if you’re reading this book you’re probably not just using someone else’s SaaS. Even if all of the other layers have bulletproof security, a vulnerability at the application security layer can easily expose all of your information.

Finally, data access security is almost always your responsibility as a customer. If you incorrectly tell your cloud provider to allow access to specific data, such as granting incorrect storage permissions, middleware permissions, or SaaS permissions, there’s really nothing the provider can do.

The root cause of many security incidents is an assumption that the cloud provider is handling something, when it turns out nobody was handling it. Many real-world examples of security incidents stemming from poor understanding of the shared responsibility model come from open Amazon Web Services Simple Storage Service (AWS S3) buckets. Sure, AWS S3 storage is secure and encrypted, but none of that helps if you don’t set your access controls properly. This misunderstanding has caused the loss of:

  • Data on 198 million US voters

  • Auto-tracking company records

  • Wireless customer records

  • Over 3 million demographic survey records

  • Over 50,000 Indian citizens’ credit reports

If you thought a discussion of shared responsibility was too basic, congratulations—you’re in the top quartile. According to a Barracuda Networks survey in 2017, the shared responsibility model is still widely misunderstood among businesses. Some 77% of IT decision makers said they believed public cloud providers were responsible for securing customer data in the cloud, and 68% said they believed these providers were responsible for securing customer applications as well. If you read your agreement with your cloud provider, you’ll find this just isn’t true!

Risk Management

Risk management is a deep subject, with entire books written about it. I recommend reading The Failure of Risk Management: Why It’s Broken and How to Fix It by Douglas W. Hubbard (Wiley), and NIST Special Publication 800-30 Rev 1 if you’re interested in getting serious about risk management. In a nutshell, humans are really bad at assessing risk and figuring out what to do about it. This section is intended to give you just the barest essentials for managing the risk of security incidents and data breaches.

At the risk of being too obvious, a risk is something bad that could happen. In most risk management systems, the level of risk is based on a combination of how probable it is that the bad thing will happen (likelihood), and how bad the results will be if it does happen (impact). For example, something that’s very likely to happen (such as someone guessing your password of “1234”) and will be very bad if it does happen (such as you losing all of your customers’ files and paying large fines) would be a high risk. Something that’s very unlikely to happen (such as an asteroid wiping out two different regional data centers at once) but that would be very bad if it does happen (going out of business) might only be a low risk, depending on the system you use for deciding the level of risk.4

In this book, I’ll talk about unknown risks (where we don’t have enough information to know what the likelihoods and impacts are) and known risks (where we at least know what we’re up against). Once you have an idea of the known risks, you can do one of four things with them:

  1. Avoid the risk. In information security this typically means you turn off the system—no more risk, but also none of the benefits you had from running the system in the first place.

  2. Mitigate the risk. It’s still there, but you do additional things to lower either the likelihood that the bad thing will happen or the impact if it does happen. For example, you may choose to store less sensitive data so that if there is a breach, the impact won’t be as bad.

  3. Transfer the risk. You pay someone else to manage things so that the risk is their problem. This is done a lot with the cloud, where you transfer many of the risks of managing the lower levels of the system to the cloud provider.

  4. Accept the risk. After looking at the overall risk level and the benefits of continuing the activity, you decide to write down that the risk exists, get all of your stakeholders to agree that it’s a risk, and then move on.

Any of these actions may be reasonable. However, what’s not acceptable is to either have no idea what your risks are, or to have an idea of what the risks are and accept them without weighing the consequences or getting buy-in from your stakeholders. At a minimum, you should have a list somewhere in a spreadsheet or document that details the risks you know about, the actions taken, and any approvals needed.

1 The Verizon Data Breach Investigations Report is an excellent free resource for understanding different types of successful attacks, organized by industry and methods, and the executive summary is very readable.

2 I recommend Threat Modeling: Designing for Security, by Adam Shostack (Wiley).

3 Original concept from an article by Albert Barron.

4 Risks can also interact, or aggregate. There may be two risks that each have relatively low likelihood and impacts, but they may be likely to occur together and the impacts can combine to be higher. For example, the impact of either power line in a redundant pair going out may be negligible, but the impact of both going out may be really bad. This is often difficult to spot; the Atlanta airport power outage in 2017 is a good example.

Get Practical Cloud Security now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.