Introduction

How you get to know is what I want to know.

Richard Feynman, American physicist

In this introduction, we’ll explain the very basics of threat modeling. We’ll also cover the most crucial security principles you need to know as the foundation for assessing the security of the systems you are analyzing.

The Basics of Threat Modeling

Let’s begin by taking a bird’s-eye view of what threat modeling is, why it’s useful, and how it fits into the development life cycle and overall security plan.

What Is Threat Modeling?

Threat modeling is the process of analyzing a system to look for weaknesses that come from less-desirable design choices. The goal of the activity is to identify these weaknesses before they are baked into the system (as a result of implementation or deployment) so you can take corrective action as early as possible. The activity of threat modeling is a conceptual exercise that aims to help you understand which characteristics of a system’s design should be modified to reduce risk in the system to an acceptable level for its owners, users, and operators.

When performing threat modeling, you look at a system as a collection of its components and their interactions with the world outside the system (like other systems it interacts with) and the actors that may perform actions on these systems. Then you try to imagine how these components and interactions may fail or be made to fail. From this process, you’ll identify threats to the system, which will in turn lead to changes and modifications to the system. The result is a system that can resist the threats you imagined.

But let’s make clear right from the beginning: threat modeling is a cyclic activity. It starts with a clear objective, continues with analysis and actions, and then it repeats. It is not a silver bullet—by itself it does not solve all your security issues. It is also not a push-button tool, like a scanner that you point at your website or your code repository that generates a punch list of items to be ticked off. Threat modeling is a logical, intellectual process that will be most effective if you involve most, if not all, of your team. It will generate discussion and create clarity of your design and execution. All of this requires work and a certain amount of specialized knowledge.

The first rule of threat modeling might be the old maxim garbage in, garbage out (GIGO).1 If you make threat modeling part of your team’s toolbox and get everyone to participate in a positive manner, you will reap its many benefits, but if you enter into it half-heartedly, without a complete understanding of its strengths and shortcomings or as a compliance “check the box” item, you’ll see it only as a time sink. Once you find a methodology that works for you and your team and put in the effort needed to make it work, your overall security posture will grow substantially.

Why You Need Threat Modeling

You need threat modeling because it will make your work easier and better in the long term. It will lead to cleaner architectures, well-defined trust boundaries (you don’t know what those are yet and why they are important, but soon you will!), focused security testing, and better documentation. And most of all, it will instill in you and your team the superpower of security mindedness in an organized, orchestrated way, leading to better security standards and guidelines across your development effort.

As important as all those side benefits are, they are not the most important. Understanding what could possibly go wrong in your system and what you can do about it will increase your trust in what you’re delivering, leaving you free to concentrate on other facets of the system. And that is what’s really behind the need for threat modeling.

Also important is to point out why you don’t need threat modeling. It is not going to solve all your security problems by itself; it will also not transform your team into security experts immediately. Most of all, you don’t need it for compliance. An empty exercise that aims only at putting the check mark in the compliance list will lead you to more frustration than knowing you do not have that specific requirement covered.

Obstacles

The trouble with programmers is that you can never tell what a programmer is doing until it’s too late.

Seymour R. Cray, creator of the Cray line of supercomputers

This maxim holds true to this day. Give a developer a specification or a reasonably well-documented set of requirements, and stand back, and many interesting things may happen.

Honestly, we know that development teams can be stressed-out overachievers who work under heavy demands and heavy responsibility. You have to deal with an almost constantly changing landscape of learning, becoming proficient, and then forgetting whole subdisciplines. It is unfair to pressure you on “not knowing some security thing that’s really basic and important.” Consider that the entire training content industry is mostly focused on delivering business-oriented objectives such as compliance and meeting training goals and other assorted metrics. There is significant room for improvement in actually delivering effective, useful content that development teams can translate into knowledge and practical use.

One of the tasks of security professionals is to further the security education of development communities. This includes how to implement secure systems and how to assess the security of code and systems, post facto. It may seem easier to rely on an expensive set of tools to supplement (and to a large extent, hide from view) the organization’s security expertise. The challenge is that the expertise built into the tools is often hidden from the user; development teams would benefit greatly if the methods of detection were transparent to them. Here are some examples:

  • Computer-based training (CBT) is the scourge of every new worker. Sets of 45 minutes of boring voices reading tired-looking slides, in the usual standard fonts, with the same stock pictures and images? And worse still, the innocuous, “solve by exclusion,” teach-nothing multiple-choice questions?

  • Overreliance on “silver bullet” scanners and static code analyzers that promise to use artificial intelligence, machine learning, taint analysis, attack trees, and the Powers of Grayskull, but fail to consistently produce the same results, or give more false positives than actual useful answers. Or the analysis tools expect the whole system to exist before they can run a scan, not to mention inserting long times in the build process that are anathema to the continuous integration/continuous development (CI/CD) values.

  • Consulting services where, upon request, a security practitioner will swoop in, execute remediation work (or “directed training”), and disappear (we call this seagull consulting; they swoop in, poop on you, and then fly off), leaving the team to deal with the consequences. Relying on a just-in-time security consultant carries significant downsides: they have no vested interest in the outcome of their actions, they are external to the team (if not to the enterprise), they bring a personal bias, and they perform “magic,” leaving the team feeling that something happened but without a sense of what transpired. Cargo cult2 behavior follows, as the team tries to constantly replicate that subset of results the consultant left behind.

We security professionals have also created a false sense of developer expectations within organizations:

  • An organization can buy its way to a strong security posture. If it invests enough money in tools, it will solve all of its security problems.

  • Thirty minutes of mandatory training a quarter is sufficient for the organization to pass the audit. These 30 minutes should be sufficient for development teams to learn what is expected of them. Since development teams have access to top-notch training content, the expensive tools they use will only “look over their shoulders” and validate that they have, indeed, done their job in a perfectly secure manner.

Lately (since mid-2019) the industry of security has been consumed by the idea of shifting left. Imagine a workflow that you read from left to right. The start of the workflow is on the left. When we say “shifting left,” we mean that we want the security processes to move as far “left,” to the beginning of the development workflow (of whatever development methodology is in use) as possible. This allows security events to happen and be dealt with as early as possible. An activity such as threat modeling, which is closely associated with design, should happen as early as possible in the lifetime of a system. And, if it didn’t happen then, it should happen right now.

Note

We don’t subscribe to the “shift left” phenomenon, preferring instead to start left by using methodologies that begin with design, or sooner—with requirements—as the foundation for the security of a system.

With changes to the process resulting in a less linear “left-to-right” development cycle, shifting left may not be capable of addressing all security needs in a system. Instead, the security community will need to shift even further left into the lives of developers and designers as individuals, before a system is being considered. There we will need to focus on training development teams to make secure choices, and bring capabilities along with them, to avoid threats at a much more fundamental level.

Then, there is the implementation, where security is supposedly shifted left by the industry’s collective training efforts, and is expressed semantically and logically by secure code. But if training has failed to deliver on the promised expectations, what possible corrective measures are available to correct for the failed assumptions?

Let’s look at another angle of the problem and then connect the pieces into a coherent response to this rant (an invitation to a crusade!). Some speak about “a place at the table” when discussing security strategy. Security teams and executives want the “stakeholders” to hold “a place” for security in the “ongoing discussion.” This allows them to justify their need to take that slice of the resources pie. But there’s another important resource that is not as recognized, because it is obfuscated by “extensive training” and “silver bullet tools.” And that resource is the developer’s time and focus.

Let’s consider a web developer. Infinite numbers of memes reflect the fact that if today a web developer learns everything they can about the LAMP stack3 before breakfast, that knowledge becomes useless right after lunch because the whole industry will have moved to the MEAN stack.4 And the MEAN stack will be superseded two grande lattes later by yet another shiny new thing until it comes right around again to the new, improved (and totally non-backward-compatible!) version of where we just started. Each one of these new stacks brings about a new set of security challenges and security-related idioms and mechanisms that must be understood and incorporated to effectively protect the system they are developing. And of course, each stack requires a distinct security contract that the web developer must learn and become fluent in quickly.

But the website can’t be down, and its administration is supposed to happen at the same time the developer is learning their new toy tool. How can security possibly expect to share the same pie (i.e., the developer’s time and attention) and receive anything but a sliver of a slice?

And this is where the crusade begins—as Richard Feynman tells us: “Teach principles, not formulas.” In this book, we will focus on principles to help you understand and think through what threat modeling is for you, how it can help in your specific case, and how you can best apply it to your project and your team.

Threat Modeling in the System Development Life Cycle

Threat modeling is an activity performed during the system development life cycle that is critical to the security of the system. If threat modeling is not performed in some fashion, security faults will likely be introduced through design choices that are possibly easily exploited, and most definitely will be hard (and costly5) to fix later. In keeping with the “build in, not bolt on” principle of security, threat modeling should not be considered a compliance milestone; real-world consequences exist for failing to perform this activity when it matters most.

Most successful companies today don’t execute projects the way they did even a couple of years ago. For example, development paradigms like serverless computing,6 or some of the latest trends and tools in CI/CD,7 have made a deep impact on how development teams design, implement, and deploy today’s systems.

Because of market demand and the race to be first, you rarely have the opportunity nowadays to sit down prior to the development of a system and see a fully fleshed-out design. Product teams rely on “minimum viable product” versions to introduce their new ideas to the public and to start building a brand and a following. They then rely on incremental releases to add functionality and make changes as issues arise. This practice results in significant changes to design occurring later in the development cycle.

Modern systems are of a complexity not seen before. You might use many third-party components, libraries, and frameworks (which may be open or closed source) to build your new software, but the components are many times poorly documented, poorly understood, and poorly secured. To create “simple” systems, you rely on intricate layers of software, services, and capabilities. Again, using serverless deployments as an example, to say “I don’t care about the environment, libraries, machines, or network, I only care about my functions” is shortsighted. How much machinery is hidden behind the curtain? How much control do you have over what’s happening “under” your functions? How do these things impact the overall security of your system? How do you validate that you are using the most appropriate roles and access rules?

To answer those questions reliably and obtain immediate results, you might be tempted to use an external security expert. But expertise in security can vary, and experts can be expensive to hire. Some experts focus on specific technologies or areas, and others’ focus is broad but shallow. Of course, this by no means describes every consultant, and we will be the first to attest to having had some great experiences with threat modeling consultants. However, you can see that there is a huge incentive to develop in-house knowledge of threat modeling and try to adapt it as much as possible to your team’s development methodology.

Developing secure systems

Regardless of the development methodology you use, the way your system develops must pass through some very specific phases (see Figure I-1).

  • Idea inception

  • Design

  • Implementation

  • Testing

  • Deployment

In the waterfall methodology, for example, these phases naturally follow each other. Note that documentation plays an ongoing role—it must happen in parallel with the other phases to be truly efficient. When using this methodology, it is easy to see that a threat model provides the most benefit at design time.

This is an affirmation you will see many times in this book. We meticulously link threat modeling with design. Why is that?

A much-quoted8 concept indicates that the cost of solving an issue rises significantly the closer it happens to or after deployment. This is quite obvious for people familiar with making and marketing software; it is much cheaper to apply solutions to a system in development than the one already deployed at thousands or, in some extreme cases, millions of places.9 You don’t have to deal with the liability of some users not applying a patch, or the possible failures in backward compatibility introduced by patching a system. You don’t have to deal with users who cannot for one reason or another move forward with the patch. And you don’t have to incur the cost of supporting a lengthy and sometimes unstable upgrade process.

So, threat modeling by its nature looks at a design, and tries to identify security flaws. For example, if your analysis shows that a certain mode of access uses a hardcoded password, it gets identified as a finding to be addressed. If a finding goes unaddressed, you are probably dealing with an issue that will be exploited later in the life of the system. This is also known as a vulnerability, which has a probability of exploitation and an associated cost if exploited. You might also fail to identify an issue, or fail to make a correct determination of something that can be exploited. Perfection and completeness are not goals of this exercise.

Note

The key objective of threat modeling is to identify flaws so they become findings (issues that you can address) and not vulnerabilities (issues that can be exploited). You can then apply mitigations that reduce both the probability of exploitation and the cost of being exploited (that is, the damage, or impact).

Once you identify a finding, you move to mitigate, or rectify, it. You do this by applying appropriate controls; for example, you might create a dynamic, user-defined password instead of a hardcoded one. Or, if the case warrants, you might run multiple tests against that password to ensure its strength. Or you might let the user decide on a password policy. Or, you might change your approach altogether and entirely remove the flaw by removing the password use and offer support for WebAuthn10 instead. In some cases, you might just assume the risk—you decide that for the manner in which the system will be deployed, it could be OK to use a hardcoded password. (Hint: It is not. Really. Think about it.) Sometimes you have to determine that a risk is acceptable. In those cases, you need to document the finding, identify and describe the rationale for not addressing it, and make that part of your threat model.

It is important to emphasize (and we will return to this throughout the book) that threat modeling is an evolutionary process. You may not find all the flaws in your system the first time it is analyzed. For example, perhaps you didn’t have the appropriate resources or the correct stakeholders examining the system. But having an initial threat model is much better than having no threat model at all. And the next iteration, when the threat model is updated, will be better, identify other flaws, and carry a higher level of assurance that no flaws were found. You and your team will acquire the experience and confidence that will lead you to consider new and more complex or subtle attacks and vectors, and your system will constantly improve.

No more waterfalling

Let’s move forward to the more modern Agile and CI/CD approaches.

Because these are faster ways of developing and deploying software, you may find it impossible to stop everything, initiate a proper design session, and agree on what needs to happen. Sometimes your design evolves with requirements from customers, and other times your design emerges from the ongoing development of your system. In these situations, it can be hard to predict the overall design of the complete system (or even to know what the complete system is), and you may not be able to make wide-ranging design modifications beforehand.

Many design proposals outline how to perform threat modeling under these circumstances—from Microsoft’s proposal of “security sprints” to applying threat modeling against smaller system units, iteratively, at every sprint. And, unfortunately, claims have been made that threat modeling “reduces the velocity” of an Agile team. Is it better to reduce the velocity of an Agile team, or that of a team of hackers who are trying to access your data? For right now, the important thing is to recognize the issue; we will point at possible solutions later.

Once you address security in the design process, you will see how security impacts all other phases of development. This will help you recognize how threat modeling can have an even bigger impact on the overall security posture of the system, which is a collective measure of:

  • The current state of security within the system

  • Attack vectors, or intrusion points, or opportunities to change the system behavior, available for an actor to explore and exploit (also known as the attack surface)

  • The existing vulnerabilities and weaknesses within the system (also known as the security debt) and the combined risk to the system and/or the business resulting from these factors

Implementation and testing

It is hard not to consider implementation and testing as the most important aspect of security in development. At the end of the day, security problems come (mostly!) from issues or mistakes made when putting lines of code into place. Some of the most infamous security issues—Heartbleed, anyone?—and most buffer overflow issues stem not from bad design, but from lines of code that didn’t do what they were supposed to do, or did it in an unexpected way.

When you look at classes of vulnerabilities (for example buffer overflows and injection issues), it is easy to see how a developer may inadvertently introduce them. It is easy to cut and paste a previously used stanza, or fall into the “who would possibly do that?” belief when considering bad input. Or the developer may simply introduce errors due to ignorance, time constraints, or other factors without any consideration of security.

Many tools out there identify vulnerabilities in written code by performing static analysis. Some tools do this by analyzing the source code; others do it by running code through simulations of input and identifying bad outcomes (this technique is known as fuzzing). Machine learning has recently emerged as another alternative for identifying “bad code.”

But does threat modeling influence these code-related issues? That depends. If you look at a system as a whole and decide you are able to completely remove an entire class of vulnerabilities by addressing the root flaw, then you have an opportunity at design time to address code-related issues. Google did this with cross-site scripting (and other vulnerability classes) by instituting libraries and patterns to be used in all products that deal with the issue.11 Unfortunately, choices made to address some types of issues may cut off any avenue to address other concerns. For example, let’s say you are working on a system with primary requirements for high performance and high reliability. You may choose to use a language that offers direct memory control and less execution overhead, such as C, instead of languages like Go or Java that offer better memory management capabilities. In this case, you may have limited options to influence the breadth of potential security concerns that need to be addressed by changing the technology stack. This means that you have to use development-time and testing-time tools to police the outcome.

Documentation and deployment

As systems are developed, the teams responsible for them may go through a self-development process. Tribal knowledge, or institutional knowledge, exists when a set of individuals comes to learn or understand something and retains that knowledge without documenting it. However, as team membership changes over time, with individuals leaving the team and new ones joining, this tribal knowledge can be lost.

Luckily, a well-documented threat model is a great vehicle to provide new team members with this formal and proprietary knowledge. Many obscure data points, justifications, and general thought processes (e.g., “Why did you folks do it like this here?!”) are well suited for being captured as documentation in a threat model. Any decisions made to overcome constraints, and their resulting security impacts, are also good candidates for documentation. The same goes with deployment—a threat model is a great place to reference an inventory of third-party components, how they are kept up-to-date, the efforts required to harden them, and the assumptions made when configuring them. Something as simple as an inventory of network ports and their protocols explains not only the way data flows in the system, but also deployment decisions concerning authentication of hosts, configuration of firewalls, etc. All these kinds of information fit well into a threat model, and if you need to respond to compliance audits and third-party audits, locating and providing relevant details becomes much easier.

Essential Security Principles

Note

The remainder of this Introduction gives a brief overview of the foundational security concepts and terminology that are critically important for both development teams and security practitioners to have at least some familiarity with. If you wish to learn more about any of these principles, check out the many excellent references we provide throughout this chapter and the book.

Familiarity with these principles and terminology is key as a foundation for additional learning—as an individual or as a team, learning as you go through your security travels.

Basic Concepts and Terminology

Figure I-2 highlights crucial concepts in system security. Understanding these relationships and the nomenclature of security are key to understanding why threat modeling is critically important to a secure system design.

Figure I-2. Relationships of security terminology

A system contains assets—functionality its users depend upon, and data accepted, stored, manipulated, or transmitted by the system. The system’s functionality may contain defects, which are also known as weaknesses. If these weaknesses are exploitable, meaning if they are vulnerable to external influence, they are known as vulnerabilities, and exploitation of them may put the operations and data of the system at risk of exposure. An actor (an individual or a process external to the system) may have malicious intent and may try to exploit a vulnerability, if the conditions exist to make that possible; some skilled attackers are capable of altering conditions to create opportunities to attempt exploitation. An actor creates a threat event in this case, and through this event threatens the system with a particular effect (such as stealing data or causing functionality to misbehave).

The combination of functionality and data creates value in the system, and an adversary causing a threat negates that value, which forms the basis for risk. Risk is offset by controls, which cover functional capabilities of a system as well as operational and organizational behaviors of the teams that design and build the system, and is modified by probabilities—the expectations of an attacker wishing to cause harm and the likelihood they will be successful should they attempt to do so.

Each concept and term requires additional explanation to be meaningful:

Weakness

A weakness is an underlying defect that modifies behavior or functionality (resulting in incorrect behavior) or allows unverified or incorrect access to data. Weaknesses in system design result from failure to follow best practices, or standards, or convention, and lead to some undesirable effect on the system. Luckily for threat modelers (and development teams), a community initiative—Common Weakness Enumeration (CWE)—has created an open taxonomy of security weaknesses that can be referenced when investigating system design for concerns.

Exploitability

Exploitability is a measure of how easily an attacker can make use of a weakness to cause harm. Put another way, exploitability is the amount of exposure that the weakness has to external influence.12

Vulnerability

When a weakness is exploitable (exploitability outside the local authorization context is nonzero),it is known as a vulnerability. Vulnerabilities provide a means for an adversary with malicious intent to cause some sort of damage to a system. Vulnerabilities that exist in a system but that are previously undiscovered are known as zero-day vulnerabilities. Zero days are no more or less dangerous than other vulnerabilities of a similar nature but are special because they are likely to be unresolved, and therefore the potential for exploitation may be elevated. As with weaknesses, community efforts have created a taxonomy of vulnerabilities, encoded in the CVE database.

Severity

Weaknesses lead to an impact on a system and its assets (functionality and/or data); the damage potential and “blast radius” from such an issue is described as the defect’s severity. For those whose primary profession is or has been in any field of engineering, severity may be a familiar term. Vulnerabilities—exploitable weaknesses—are by definition at least as severe as the underlying defect, and more often the severity of a defect is increased because it is open to being exploited. Methods for calculating severity are described in “Calculating Severity or Risk”.

Note

Unfortunately, the process of determining the severity of a weakness is not always so cut and dried. If the ability to leverage the defect to cause an impact is unrecognized at the time of discovery of the weakness, how severe is the issue? What happens if the defect is later determined to be exposed, or worse becomes exposed as a result of a change in the system design or implementation? These are hard questions to answer. We’ll touch on this later when we introduce risk concepts.

Impact

If a weakness or vulnerability is exploited, it will result in some sort of impact to the system,such as breaking functionality or exposing data. When rating the severity of an issue, you will want to assess the level of impact as a measure of potential loss of functionality and/or data as the result of successful exploitation.

Actor

When describing a system, an actor is any individual associated with the system, such as a user or an attacker. An actor with malicious intent, either internal or external to the organization, creating or using the system, is sometimes referred to as an adversary.

Threat

A threat is the result of a nonzero probability of an attacker taking advantage of a vulnerability to negatively impact the system in a particular way (commonly phrased in terms of “threat to…” or “threat of…”).

Threat event

When an adversary makes an attempt (successful or not) to exploit a vulnerability with an intended objective or outcome, this becomes known as a threat event.

Loss

For the purpose of this book and the topic of threat modeling, loss occurs when one (or more) impacts affect functionality and/or data as a result of an adversary causing a threat event:

  • The actor is able to subvert the confidentiality of a system’s data to reveal sensitive or private information.

  • The actor can modify the interface to functionality, change the behavior of functionality, or change the contents or provenance of data.

  • The actor can prevent authorized entities from accessing functionality or data, either temporarily or permanently.

    Loss is described in terms of an asset or an amount of value.

Risk

Risk combines the value of the potentially exploited target with the likelihood an impact may be realized. Value is relative to the system or information owner, as well as to the attacker. You should use risk to inform priority of an issue, and to decide whether to address the issue. Severe vulnerabilities that are easy to exploit, and those that lead to significant damages through loss, should be given a high priority to mitigate.

Calculating Severity or Risk

Severity (the amount of damage that can be caused by successful exploitation of a vulnerability), and risk (the combination of the likelihood of initiation of a threat event and the likelihood of success to generate a negative impact as a result of exploitation)—can be determined formulaically. The formulas are not perfect but using them provides consistency. Many methods exist today for determining severity or risk, and some threat modeling methodologies use alternative risk-scoring methods (not described in this book). A sample of three popular methods in general use (one for measuring severity, two for risk) are presented here.

CVSS (severity)

The Common Vulnerability Scoring System (CVSS) is now in version 3.1, and is a product of the Forum of Incident Response and Security Teams (FIRST).

CVSS is a method for establishing a value from 0.0 to 10.0, that allows you to identify the components of severity. The calculation is based upon the likelihood of a successful exploitation of a vulnerability, and a measurement of potential impact (or damage). Eight metrics, or values, are set in the calculator to derive a severity rating, as shown in Figure I-3.

Figure I-3. Common Vulnerability Scoring System metrics, vector, and score

Likelihood of success is measured on specific metrics that are given a numeric rating. This results in a value known as the exploitability subscore. Impact is measured similarly (with different metrics) and is known as the impact subscore. Added together, the two subscores result in an overall base score.

Tip

Remember, CVSS does not measure risk but severity. CVSS can tell you the likelihood that an attacker will succeed in exploiting the vulnerability of an impacted system, and the amount of damage they can do. But it cannot indicate when or if an attacker will attempt to exploit the vulnerability. Nor can it tell you how much the impacted resource is worth or how expensive it will be to address the vulnerability. It is the likelihood of the initiation of an attack, the value of the system or functionality, and the cost to mitigate it that drives the risk calculation. Relying on raw severity is a good way to communicate information about a defect, but is a very imperfect way to manage risk.

DREAD (risk)

DREAD is an older,13 yet foundationally important, method for understanding the risk from security concerns. DREAD is the partner to the STRIDE threat modeling methodology; STRIDE is discussed in depth in Chapter 3.

DREAD is an acronym for:

Damage

If an adversary conducted an attack, how much destruction could they cause?

Reproducibility

Is a potential attack easily reproduced (in method and effect)?

Exploitability

How easy is conducting a successful attack?

Affected users

What percentage of the user population might be impacted?

Discoverability

If the adversary does not already know of the potential for an attack, what is the likelihood they can discover it?

DREAD is a process for documenting characteristics of a potential for an attack against a system (via a vector by an adversary) and coming up with a value that can be compared to other such values for other attack scenarios and/or threat vectors. The risk score for any given attack scenario (combination of a security vulnerability and adversary) is calculated by considering the characteristics of exploitation of a vulnerability by an attacker and assigning them a score in each dimension (i.e., D, R, E, A, D), for low-, medium-, and high-impact issues, respectively.

The total of the scores for each dimension determine the overall risk value. For example, an arbitrary security issue in a particular system may have scores of [D = 3, R = 1, E = 1, A = 3, D = 2] for a total risk value of 10. To have meaning, this risk value can be compared to other risks that are identified against this particular system; it is less useful to attempt to compare this value with values from other systems, however.

FAIR Method for risk quantification (risk)

The Factor Analysis of Information Risk (FAIR) method is gaining popularity among executive types because it offers the right level of granularity with more specificity to enable more effective decision making. FAIR is published by the Open Group and is included in ISO/IEC 27005:2018.

DREAD is an example of a qualitative risk calculation. FAIR is an international standard for quantitative risk modeling and for understanding the impact to assets from threats using measurements of value (hard and soft currency costs) and probability of realization (occurrences, or threat events) of a threat by an actor. Use these quantitative values to describe to your management and business leaders the financial impact to the business from risks identified in your systems, and compare them against the cost to defend against threat events. Proper risk management practices suggest the cost to defend should not exceed the value of the asset, or the potential loss of an asset. This is also known as the $50 lock on a $5 pen paradigm.

Warning

FAIR is thorough and accurate, but also complex, and requires specialized knowledge to perform the calculations and simulations correctly. This is not something you want to do live in a threat modeling review session, nor something you want to hoist on your security subject matter experts (SMEs), if you have them. Security experts have expertise in finding weaknesses and threats, not modeling financial impact valuations. Hiring individuals with skills in computational methods and financial modeling, or finding a tool that does the hard math for you, is a better course of action if you plan to adopt FAIR.

Core Properties

Three core properties—confidentiality, integrity, and availability—form the foundation on which all other things in security are built. When someone wants to know if something is secure, these properties and whether they are intact determine a response. These properties support a key goal: trustworthiness. In addition, fourth and fifth properties (privacy and safety), are related to the first three but have slightly different focuses.

Confidentiality

A system has the property of confidentiality only if it guarantees access to the data entrusted to it exclusively to those who have the appropriate rights, based on their need to know the protected information. A system that does not have a barrier stopping unauthorized access fails to safeguard confidentiality.14

Integrity

Integrity exists when the authenticity of data or operations can be verified and the data or functionality has not been modified or made unauthentic through unauthorized activity.15

Availability

Availability means authorized actors are able to access system functionality and/or data whenever they have the need or desire to do so. In certain circumstances, a system’s data or operations may not be available as a result of a contract or agreement between users and system operators (such as a website being down for regular maintenance); if the system is unavailable because of a malicious action by an adversary, availability will have been compromised.16

Privacy

While confidentiality refers to the controlled access to private information shared with others, privacy refers to the right of not having that information exposed to unauthorized third parties. Many times when people talk about confidentiality, they really expect privacy, but although the terms are often used interchangeably, they are not the same concept. You could argue that confidentiality is a prerequisite to privacy. For example, if a system cannot guarantee the confidentiality of the data it stores, that system can never provide privacy to its users.

Safety

Safety is “freedom from unacceptable risk of physical injury or of damage to the health of people, either directly, or indirectly as a result of damage to property or to the Environment.”17 Naturally, for something to meet safety requirements, it has to operate in a predictable manner. This means that it must at least maintain the security properties of integrity and availability.

Fundamental Controls

The following controls, or functional behaviors and capabilities, support the development of highly secure systems.

Identification

Actors in a system must be assigned a unique identifier meaningful to the system. Identifiers should also be meaningful to the individuals or processes that will consume the identity (e.g., the authentication subsystem; authentication is described next).

An actor is anything in a system (including human users, system accounts, and processes) that has influence over the system and its functions, or that wishes to gain access to the system’s data. To support many security objectives, an actor must be granted an identity before it can operate on that system. This identity must come with information that allows a system to positively identify the actor—or in other words, to allow the actor to show proof of identity to the system. In some public systems, unnamed actors or users are also identified, indicating that their specific identity is not important but is still represented in the system.

Tip

Guest is an acceptable identity on many systems as a shared account. Other shared accounts may exist, although use of shared accounts should be carefully considered as they lack the ability to trace and control actor behavior on an individual basis.

Authentication

Actors with identities need to prove their identity to the system. Identity is usually proven by the use of a credential (such as a password or security token).

All actors who wish to use the system must be able to satisfactorily provide proof of their identity so that the target system can verify that it is communicating with the right actor. Authentication is a prerequisite for additional security capabilities.

Authorization

Once an actor has been authenticated—that is, their identity has been proven satisfactorily—the actor can be granted privileges within the system to perform operations or access functionality or data. Authorization is contextual, and may be, but is not required to be, transitive, bidirectional, or reciprocal in nature.

With authentication comes the ability for a system, based on the offered proof of identification provided by an actor, to specify the rights of that actor. For example, once a user has authenticated into a system and is allowed to perform operations in a database, access to that database is granted based only on the actor’s rights. Access is usually granted in terms of primitive operations such as read, write, or execute. Access-control schemes that govern an actor’s behavior within a system include the following:

Mandatory access control (MAC)

The system constrains the authorizations for actors.

Discretionary access control (DAC)

Actors can define privileges for operations.

Role-based access control (RBAC)

Actors are grouped by meaningful “roles,” and where roles define privilege assignments.

Capability-based access control

An authorization subsystem assigns rights through tokens that actors must request (and be granted) in order to perform operations.

Tip

Guest accounts are usually not authenticated (there is no identity to prove), but these accounts may be authorized explicitly with a minimal level of capability.

Logging

When an actor (human or process) performs a system operation, such as executing a feature or accessing data stores, a record of that event should be recorded. This supports traceability. Traceability is important when trying to debug a system; when the recorded events are considered security relevant, the traceability also supports the ability for critical tasks such as intrusion detection and prevention, forensics, and evidence collection (in the case of a malicious actor intruding upon a system).

Auditing

Logging creates records; audit records are well-defined (in format and content), ordered in time, and usually tamper resistant (or at least tamper evident). The capability of “looking back in time” and understanding the order in which events occurred, who performed which operations, and when, and optionally to determine whether the operations were correct and authorized, is critical for security operations and incident response activities.

Basic Design Patterns for Secure Systems

When you are designing a system, you should keep certain security principles and methodologies in mind. Not all principles may apply to your system. But it is important for you to consider them to ensure that they hold true if they apply to you.

In 1975, a seminal article by Jerome Saltzer and Michael Schroeder, “The Protection of Information in Computer Systems,”18 was published. Although much has changed since its publication, the basic tenets are still applicable. Some of the fundamentals we discuss in this book are based on the principles laid out by Saltzer and Schroeder. We also want to show you how some of those principles have become relevant in different ways than originally intended.

Zero trust

A common approach to system design, and security compliance, is “trust, but verify,” or zero trust, which is to assume the best outcome for an operation (such as a device joining a network, or a client calling an API) and then perform a verification of the trust relationship secondarily. In a zero trust environment, the system ignores (or never establishes) any prior trust relationship and instead verifies everything before deciding to establish a trust relationship (which may then be temporary).19

Also known as complete mediation, this concept looks amazingly simple on paper: ensure that the rights to access an operation are checked every time an object is accessed, and that the rights for that access operation are checked beforehand. In other words, you must verify that an actor has the proper rights to access an object every time that access is requested.

Note

John Kindervag created the concept of zero trust in 2010,20 and it has been commonly applied to network perimeter architecture discussions. The authors decided to import the concept into the security principles, and believe it also applies with no modifications to the security decisions that need to happen at the application level.

Design by contract

Design by contract is related to zero trust, and assumes that whenever a client calls a server, the input coming from that client will be of a certain fixed format and will not deviate from that contract.

It is similar to a lock-and-key paradigm. Your lock accepts only the correct key and trusts nothing else. In “Securing the Tangled Web,”21 Christoph Kern explains how Google has significantly reduced the amount of cross-site scripting (XSS) flaws in applications by using a library of inherently safe API calls—by design. Design by contract addresses zero trust by ensuring that every interaction follows a fixed protocol.

Least privilege

This principle means that an operation should run using only the most restrictive privilege level that still enables the operation to succeed. In other words, in all layers and in all mechanisms, make sure that your design constricts the operator to the minimum level of access required to accomplish an individual operation, and nothing more.

If least privilege is not followed, a vulnerability in an application might offer full access to the underlying operating system, and with it all the consequences of a privileged user having unfettered access to your system and your assets. This principle applies for every system that maintains an authorization context (e.g., an operating system, an application, databases, etc.).

Defense in depth

Defense in depth uses a multifaceted and layered approach to defend a system and its assets.

When thinking about defense of your system, think about the things you want to protect—assets—and how an attacker might try to access them. Consider what controls you might put in place to limit or prevent access by an adversary (but allow access by a properly authorized actor). You might consider parallel or overlapping layers of controls to slow down the attacker; alternatively you might consider implementing features that confuse or actively deter an adversary.

Examples of defense in depth applied to computer systems include the following:

  • Defending a specific workstation with locks, guards, cameras, and air-gapping

  • Introducing a bastion host (or firewall) between the system and the public internet, then an endpoint agent in the system itself

  • Using multifactor authentication to supplement a password system for authentication, with a time delay that raises exponentially between unsuccessful attempts

  • Deploying a honeypot and fake database layer with intentionally priority-limited authentication validation functions

Any additional factor that acts as a “bump in the road” and makes an attack costlier in terms of complexity, money, and/or time is a successful layer in your defense in depth. This way of evaluating defense-in-depth measures is related to risk management—defense in depth does not always mean defense at all costs. A balancing act occurs between deciding how much to spend to secure assets versus the perceived value of those assets, which falls into scope of risk management.

Keeping things simple

Keeping things simple is about avoiding overengineering your system. With complexity comes the increased potential for instability, challenges in maintenance, and other aspects of system operation, and a potential for ineffectual security controls.22

Care must be taken to avoid oversimplification as well (as in dropping or overlooking important details). Often that happens in input validation, as we assume (correctly or not) that an upstream data generator will always supply valid and safe data, and avoid (incorrectly) our own input validation in an effort to simplify things. For a more extensive discussion of these expectations, see Brook S. E. Schoenfield’s work on security contracts.23 At the end of the day, a clean, simple design over an overengineered one will often provide security advantages over time, and should be given preference.

No secret sauce

Do not rely on obscurity as a means of security. Your system design should be resilient to attack even if every single detail of its implementation is known and published. Notice, this doesn’t mean you need to publish it,24 and the data on which the implementation operates must remain protected—it just means you should assume that every detail is known, and not rely on any of it being kept secret as a way to protect your assets. If you intend to protect an asset, use the correct control—encryption or hashing; do not hope an actor will fail to identify or discover your secrets!

Separation of privilege

Also referred to as separation of duties, this principle means segregating access to functionality or data within your system so one actor does not hold all rights. Related concepts include maker/checker, where one user (or process) may request an operation to occur and set the parameters, but another user or process is required to authorize the transaction to proceed. This means a single entity cannot perform malicious activities unimpeded or without the opportunity for oversight, and raises the bar for nefarious actions to occur.

Consider the human factor

Human users have been referred to as the weakest link in any system,25 so the concept of psychological acceptability must be a basic design constraint. Users who are frustrated by strong security measures will inevitably try to find ways around them.

When developing a secure system, it is crucial to decide just how much security will be acceptable to the user. There’s a reason we have two-factor authentication and not sixteen-factor authentication. Put too many hurdles between a user and the system, and one of these situations will occur:

  • The user stops using the system.

  • The user finds workarounds to bypass the security measures.

  • The powers that be stop supporting the decision for security because it impairs productivity.

Effective logging

Security is not only preventing bad things from happening, but also about being aware that something happened and, to the extent possible, what happened. The capability to see what happened comes from being able to effectively log events.

But what constitutes effective logging? From a security point of view, a security analyst needs to be able to answer three questions:

  • Who performed a specific action to cause an event to be recorded?

  • When was the action performed or event recorded?

  • What functionality or data was accessed by the process or user?

Nonrepudiation, which is closely related to integrity, means having a set of transactions indicating who did what, with the record of each transaction maintaining integrity as a property. With this concept, it is impossible for an actor to claim they did not perform a specific action.

Warning

As important as it is to know what to log and how to protect it, knowing what not to log is also crucial. In particular:

  • Personally identifiable information (PII) should never be logged in plain text, in order to protect the privacy of user data.

  • Sensitive content that is part of API or function calls should never be logged.

  • Clear-text versions of encrypted content likewise should not be logged.

  • Cryptographic secrets, such as a password to a system or a key used to decrypt data, should not be logged.

Using common sense is important here, but note that keeping these logs from being integrated into code is an ongoing battle against the needs of development (mostly debugging). It is important to make it clear to development teams that it is unacceptable to have switches in code that control whether sensitive content should be logged for debugging purposes. Deployable, production-ready code should not contain logging capabilities for sensitive information.

Fail secure

When a system encounters an error condition, this principle means not revealing too much information to a potential adversary (such as in logs or user error messages) and not simply granting access incorrectly, such as when the failure is in the authentication subsystem.

But it is important to understand that there is a significant difference between fail secure and fail safe. Failing while maintaining safety may contradict the condition of failing securely, and will need to be reconciled in the system design. Which one is appropriate in a given situation, of course, depends on the particulars of the situation. At the end of the day, failing secure means that if a component or logic in the system falters, the result is still a secure one.

Build in, not bolt on

Security, privacy, and safety should be fundamental properties of the system, and any security features should be included in the system from the beginning.26

Security, like privacy or safety, should not be considered an afterthought or rely solely or primarily on external system components to be present. A good example of this pattern is the implementation of secure communications; the system must support this natively—i.e., should be designed to support Transport Layer Security (TLS) or a similar method for preserving confidentiality of data in transit. Relying on the user to install specialized hardware systems to enable end-to-end communications security means that if the user does not do so, the communications will be unprotected and potentially accessible to malicious actors. Do not assume that users will take action on your behalf when it comes to the security of your system.

Summary

After reading this Introduction, you should have all the foundational knowledge you need to get the most out of the chapters that follow: the basics of threat modeling and how it fits into the system development life cycle, and all the most important security concepts, terminology, and principles that are fundamental to understanding the security of your system. When you perform threat modeling, you will be looking for these security principles in your system’s design to ensure your system is properly protected from intrusion or compromise.

In Chapter 1 we discuss how to construct abstract representations of a system’s design to identify security or privacy concerns. In later chapters, we will introduce specific threat modeling methodologies that build on the concepts in this Introduction and the modeling techniques in Chapter 1 to perform complete security threat assessments using the threat modeling activity.

Welcome aboard the security train!

1 This phrase is accredited to Wilf Hey and to Army Specialist William D. Mellin.

2 “A cargo cult is a millenarian belief system in which adherents practice rituals which they believe will cause a more technologically advanced society to deliver goods.” Wikipedia, accessed 10/24/2020.

3 The LAMP stack consists of the collection of Linux OS, Apache web server, MySQL database, and PHP scripting language.

4 The MEAN stack consists of MongoDB, Express.js, Angular.js, and Node.js.

5 Arvinder Saini, “How Much Do Bugs Cost to Fix During Each Phase of the SDLC?,” Software Integrity Blog, Synopsis, January 2017, https://oreil.ly/NVuSf; Sanket, “Exponential Cost of Fixing Bugs,” DeepSource, January 2019, https://oreil.ly/ZrLvg.

6 “What Is Serverless Computing?,” Cloudflare, accessed November 2020, https://oreil.ly/7L4AJ.

7 Isaac Sacolick, “What Is CI/CD? Continuous Integration and Continuous Delivery Explained,” InfoWorld, January 2020, https://oreil.ly/tDc-X.

8 Barry Boehm, Software Engineering Economics (Prentice Hall, 1981).

9 Kayla Matthews, “What Do IoT Hacks Cost the Economy?,” IoT For All, October 2018, https://oreil.ly/EyT6e.

10 “What is WebAuthn?,” Yubico, https://oreil.ly/xmmL9.

11 Christoph Kern, “Preventing Security Bugs through Software Design,” USENIX, August 2015, https://oreil.ly/rcKL_.

12 “External” is relative when used here, and is specific to what is known as the authorization context; for example, the operating system, application, databases, etc.

13 Some say DREAD has outlived its usefulness; see Irene Michlin, “Threat Prioritisation: DREAD Is Dead, Baby?,” NCC Group, March 2016, https://oreil.ly/SJnsR.

14 NIST 800-53 Revision 4, “Security and Privacy Controls for Federal Information Systems and Organizations”: B-5.

15 NIST 800-53 Revision 4, “Security and Privacy Controls for Federal Information Systems and Organizations”: B-12.

16 NIST 800-160 vol 1, “Systems Security Engineering: Considerations for a Multidisciplinary Approach in the Engineering of Trustworthy Secure Systems”: 166.

17 “Functional Safety and IEC 61508,” International Electrotechnical Commission, https://oreil.ly/SUC-E.

18 J. Saltzer and M. Schroeder, “The Protection of Information in Computer Systems,” University of Virginia Department of Computer Science, https://oreil.ly/MSJim.

19 “Zero Trust Architecture,” National Cybersecurity Center of Excellence, https://oreil.ly/P4EJs.

20 Brook S. E. Schoenfield, expert threat modeling practitioner and prolific author, reminds us that the idea of “observe mutual distrust” was already posited by Microsoft in 2003–04, but unfortunately we were unable to locate a reference. We trust Brook!

21 Christoph Kern, “Securing the Tangled Web,” acmqueue, August 2014, https://oreil.ly/ZHVrI.

22 Eric Bonabeau, “Understanding and Managing Complexity Risk,” MIT Sloan Management Review, July 2007, https://oreil.ly/CfHAc.

23 Brook S. E. Schoenfield, Secrets of a Cyber Security Architect (Boca Raton, FL: CRC Press, 2019).

24 Except when using copyleft licenses and open source projects, of course.

25 “Humans Are the Weakest Link in the Information Security Chain,” Kratikal Tech Pvt Ltd, February 2018, https://oreil.ly/INf8d.

26 Some security features or functionality may have negative effects on usability, so it may be acceptable to disable some security capabilities by default if users can enable them when deploying the system.

Get Threat Modeling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.