Chapter 1. Understanding Confidential Computing and Trust
The phrase “data is the new oil” makes perfect sense in today’s digital world. In recent years, we have seen widespread adoption of cloud computing, and with artificial intelligence (AI) making considerable advances, particularly in areas such as generative AI, data has become one of the most valuable assets. For many businesses, data is the key differentiator because it gives them a competitive edge and, in many instances, enables incumbents such as startups to challenge existing hegemonies.
However, as data becomes more important to businesses, it presents a challenge: How can we ensure that data is protected while computation is performed on it? Additionally, how can we be certain that the code that is conducting the computation has not been tampered with? Confidential computing addresses these challenges by providing assurances with hardware-based, attested trusted execution environments (TEEs), which provide data integrity and confidentiality as well as code integrity. In this way, they ensure that a high level of trust can be achieved even when data and code are running within a multi-tenant cloud environment.
TEEs isolate data in memory. The TEEs used by Azure encrypt data in memory using keys managed by the underlying CPU firmware. The firmware also prevents data in memory from being altered by any software running outside the TEE that owns the memory. TEEs can help prevent unauthorized access to data in memory from cloud operators and malicious third parties, and in some cases, they can help prevent unauthorized access to data in memory from insider threats such as a rogue tenant administrator. In addition to making data in memory in the cloud safer, confidential computing can enable a whole new class of multi-party privacy preservice applications, called confidential clean rooms.
This chapter provides an understanding of confidential computing, beginning with its definition as published by the Confidential Computing Consortium (CCC), its key features, and an overview of how it helps complete the data protection lifecycle. It then discusses how vendor-specific confidential computing technologies allow you to choose the size of your TEE, and the types of threats it can address, by selecting the appropriate memory isolation level. Finally, it establishes confidential computing’s alignment with zero-trust principles.
In subsequent chapters, this report covers various use cases for confidential computing, the Azure confidential computing portfolio of products, and the road ahead for confidential computing on Azure.
Confidential Computing Overview
This section discusses the definition of confidential computing as established by the CCC, the key features of confidential computing, and how confidential computing helps complete the data protection lifecycle.
Confidential Computing Definition
The term confidential computing is defined by the CCC as “the protection of data in-use by performing computation in a hardware-based, attested trusted execution environment (TEE).” This definition is meant to be inclusive of and relevant to a wide range of computing models, including public, private (on-premises), and hybrid cloud, as well as edge computing, Internet of Things (IoT) devices, mobile devices, and even operational technology (OT) and more. Furthermore, it is vendor agnostic in the sense that it is not associated with a particular processor type or technology.
Key Features of Confidential Computing
The key features of confidential computing are a hardware root of trust, memory protection, and attestation.
Hardware root of trust
Hardware root of trust ensures that data encryption is performed using keys derived from a universally unique master key that is recorded in the CPU silicon itself when it is manufactured. This master key cannot be exported from the CPU and so is inaccessible from outside the CPU hardware. The keys used to encrypt data in memory are generated by the firmware running on the underlying TEE hardware and are also inaccessible outside the bounds of the CPU hardware. This allows all software outside of the TEE to be untrusted. If memory encryption were managed by software in the cloud service provider’s host OS or hypervisor, it would violate a key principle of confidential computing.
Memory protection
Memory protection ensures that data is always encrypted when written to memory, using keys generated by the firmware that runs on the underlying TEE hardware and is inaccessible to cloud operators. Only the software in the TEE that owns that memory can write over it or read it, protecting it from unauthorized access or modification. Memory protection enables the following:
- Data and code confidentiality
-
This ensures that an unauthorized entity has no ability to view or access the data or code resident in the TEE memory. It practically eliminates the risk of leaking sensitive data such as personal data or intellectual property (IP), like AI models, that costs millions of dollars to develop.
- Data and code integrity
-
This ensures that an unauthorized entity has no ability to add, alter, or delete the data or code resident in the TEE memory. This is crucial because data and code integrity must be assured during computation; otherwise, the computation cannot be trusted.
Attestation
Attestation is a critical method for establishing trust over the hardware and software components of the TEE. A successful attestation ensures:
-
The validity and trustworthiness of the TEE hardware. This establishes that the TEE is indeed based on the expected hardware (manufacturer, version, firmware, and so on) and that the associated memory protection functions are enabled.
-
The validity and trustworthiness of the software executing inside the TEE. This establishes that the software has a valid identity (version, runtime attributes, and so on) and has not been tampered with.
Confidential Computing in the Data Protection Lifecycle
One of the main goals of security is to keep data protected from unauthorized access, which can result from a variety of threats, including rogue administrators, rogue datacenter operators, and external threats such as hackers and corporate spies. The need for data protection manifests itself under three primary conditions: while data is at rest, in transit, and in use. While techniques for protecting data at rest and in transit, such as encryption, have matured over time, protecting data while in use during computation has proved much more difficult. For instance, data in transit can be protected with the TLS protocol, and data at rest can be protected with well-established cryptographic methods, such as those prescribed by FIPS-140.
However, with the widespread adoption of cloud computing, the need to protect data while it is in use in memory has grown drastically, leading to the emergence of various technologies, including confidential computing. Figure 1-1 shows how confidential computing helps fill a gap in the overall data protection lifecycle.
In a nutshell, the primary goal of confidential computing is to protect the data in use during computation by using a hardware-based, attested TEE. We will discuss why attestation is crucial for confidential computing in the next section. For now, let’s examine some of the key characteristics of a TEE and the role it plays in confidential computing.
Note
Please note that there are several techniques to protect data while it is in use during computation, with varying degrees of maturity, including but not limited to homomorphic encryption (HE), secure multiparty computation (SMPC), and zero-knowledge proof (ZKP). Please see Appendix B for additional resources on privacy-enhancing technologies.
Memory Isolation Levels
The CCC definition of confidential computing is vendor agnostic and does not specify what is included or excluded from the TEE’s protected memory, which is encrypted with a single CPU-generated key. Varying confidential computing implementations from the CPU vendors add a new dimension to the TEE: the level of memory isolation.
Currently, CPU vendors offer three levels of TEE memory isolation: virtual machine level, container group level, and code level (down to the level of a code fragment). Each level gets its own unique memory encryption key assigned by the underlying CPU firmware. These three levels of isolation are summarized in Table 1-1.
Level of isolation | Technologies | Ease of migration | Protection from cloud operators and malicious third parties | Protection from insider threat |
---|---|---|---|---|
Virtual machine | Intel® TDX,a AMD® SEV-SNPb | Lift and shift | Yes | Not by default |
Container group | AMD® SEV-SNP | Lift and shift | Yes | Yes |
Application code | Intel® SGXc | Code refactoring required | Yes | Yes |
a Intel® Trust Domain Extensions (TDX) b 3rd Gen AMD® EPYC™ processors with Secure Encrypted Virtualization–Secure Nested Paging (SEV-SNP) |
Note that while virtual machine-level isolation supports the lift and shift of existing workloads, it does not, by default, protect any code or data within the virtual machine (VM) from insider threats such as a rogue or fallible user with administrator privileges. Also note that while code-level isolation offers the smallest code base in the TEE, that level of isolation comes at the cost of having to refactor code to use open source SDKs for implementation. For this reason, container group-level isolation holds great promise for more easily protecting cloud-native workloads from even insider threats.
Confidential Computing and Zero Trust
The term zero trust was first coined by Forrester in 2009, although its roots may be traced all the way back to “de-perimeterisation,” as discussed in publications such as Defense Information Systems Agency (DISA) Black Core and Jericho Forum Commandments. In recent years, zero trust has gained broad support. Its significance is demonstrated by the fact that it is an integral part of the national cybersecurity strategy of the United States (and others), as stated in memorandum M-22-09 “Moving the U.S. Government Toward Zero Trust Cybersecurity Principles”. The philosophy behind zero-trust security is that “security” cannot be inferred by virtue of a perimeter. Instead, we should embrace a security approach that is deeply rooted in a never-trust, always-verify mindset. This section discusses the three key principles of zero trust and the importance of hardware root of trust, attestation, and TEE size in addressing these principles.
Zero-Trust Principles
To understand zero trust and its guiding principles, let’s examine the definition provided by NIST: “Zero trust (ZT) provides a collection of concepts and ideas designed to minimize uncertainty in enforcing accurate, least privilege per-request access decisions in information systems and services in the face of a network viewed as compromised.”
This definition can be condensed into Figure 1-2, which highlights three key zero-trust principles.
Let’s briefly review each of these principles.
Verify explicitly
This principle verify explicitly states that authorization decisions must always be based on the full context of the access request and should be enforced each time a request is made. The full context includes attributes such as user or workload identity, geolocation, device compliance status, endpoint or service requested, data classification, and historical access pattern for detecting anomalous behavior, among others. Explicit verification, a key tenet of zero trust, represents a significant departure from the commonly used security approach known as perimeter-based security, which has a more relaxed security posture and allows entity access to resources within the perimeter (e.g., applications, services, etc.) without verifying the full context of the request every time a request is made. In essence, the principle of explicit verification achieves a higher security posture by requiring all requests to be explicitly scrutinized every time an access request is made.
Use least-privilege access
The principle of least-privilege access stresses applying security techniques such as just-in-time and just-enough-access (JIT/JEA) to ensure that when access is granted, it is limited to what is necessary. This reduces the attack surface area by limiting access privileges and the access duration to the absolute minimums required to complete the task. In addition to JIT/JEA, machine learning–based risk policies that are adaptive in nature should be implemented to ensure least-privilege access is enforced consistently throughout the environment.
Assume breach
The principle of assume breach focuses on operating and protecting organizational resources with the assumption that an adversary is already present in the environment. This mindset leads to security measures such as “deny by default” and the application of thorough scrutiny to all users, devices, data flows, and access requests on a continual basis. Assuming a breach has occurred also emphasizes the necessity of logging, inspecting, and continuously monitoring any configuration changes, resource accesses, and network traffic for suspicious behavior.
Importance of Hardware Root of Trust for Zero Trust
The hardware root of trust serves as a bedrock for the security of any confidential computing platform. According to NIST, the root of trust is “a starting point that is implicitly trusted.” The hardware root of trust is typically a silicon chip, or set of chips, specifically designed to be highly tamper resistant. Once the chips are produced, they are nonfungible, so their trustworthiness cannot practically be compromised after they are deployed in the field (though that doesn’t mean it is impossible). This trust stems from the fact that silicon chips first measure themselves using cryptographic methods and provide assurances that they have not been tampered with, and these assurances can be verified at any time. The underlying idea is that a hardware chip establishes trust at the root level before any other components further down the boot process chain, and so on, are executed.
When organizations maintain exclusive data centers in a private cloud, they control the hardware infrastructure and assume responsibility for managing the servers. However, when organizations adopt public cloud, they increase their dependence on cloud operators for infrastructure management and datacenter operations. The hardware root of trust for TEEs helps ensure that customer data is kept private even when in use in a public cloud.
As its name indicates, hardware root of trust is established by hardware provided by the chip manufacturer instead of software provided by the cloud operator. This provides a separation of concerns that helps safeguard against various types of threats, including those that may arise when a cloud operator gains access to data while it is in use in memory. This is because the keys used to encrypt data in memory are generated by CPU firmware and unavailable to cloud operators. Furthermore, the attestation process helps ensure data and code integrity and confidentiality by preventing that workload from executing if it is migrated to hardware without memory protection enabled.
In this way, hardware root of trust along with attestation facilitate the implementation of the zero-trust principles of explicitly verifying and assuming breach. Organizations may always assume a compromise by anyone, including the cloud operator, and still maintain a high security posture by explicitly verifying the TEE and the code executing within it. This decoupling, which removes the need to trust the cloud operator hosting the infrastructure, is a significant step toward implementing zero trust.
The verification of the root of trust establishes that the hardware is authentic and is configured to support a TEE. It also establishes that the code running within the TEE has not been tampered with. This is achieved through attestation, which is discussed next.
Importance of Attestation for Zero Trust
Attestation is a key enabler for organizations implementing zero trust to move their confidential workloads to the cloud and enjoy the benefits of cloud scalability and performance. Without attestation, they must rely on the cloud operator’s word regarding the privacy safeguards in place to prevent unauthorized access to data while it is in use.
Attestation removes the need for an organization to implicitly trust the TEE hardware and the code running within it. It provides a mechanism for verification by expressly requesting cryptographic evidence that proves both the validity of the TEE itself (e.g., it is a valid hardware-based TEE with the required attributes) and that the code has not been altered (e.g., code hash).
Additionally, attestation facilitates the implementation of the zero-trust principles of explicitly verifying and assuming breach. Organizations may assume a compromise by anyone, including the cloud operator, and still maintain a high security posture by explicitly verifying TEE and the code executing within it.
Memory Isolation Level and Zero Trust
When designing a secure system, reducing the size of the code in the TEE is desirable because it reduces the attack surface area available for malicious actors to exploit. Moreover, when it removes the guest OS (the OS within the VM) from the TEE, as in the case with code-level and container-level isolation, it can help protect against insider threats. In this way, less is more when it comes to how much code is in the TEE. Any component that is a part of the TEE must also be able to provide assurances via attestation using cryptographic methods regarding its authenticity—that its measurements at deployment time were as intended; otherwise, the other components in the system that depend on it won’t “trust” it.
In confidential computing, trust must be established all the way down to the silicon/CPU manufacturer, using the previously described concepts of hardware root of trust and attestation. These allow you to explicitly verify the state of the TEE components and that all access to and from the TEE adheres to the principle of least-privilege access. Various vendor-specific technologies enable reduction of what is in the TEE by providing memory isolation at different levels such as virtual machine level, container group level, and code level (even reducing it to few lines of code). As a result, more software can be moved outside of the TEE and can be breached without affecting the data in the TEE.
Figure 1-3 highlights how the three zero-trust principles of assume breach, verify explicitly, and use least-privilege access align with the key features of confidential computing and the significance of the level of memory isolation in assuming what can be breached.
Confidential computing enables organizations to prevent unauthorized access to their data and code while they are in use. The hardware root of trust and attestation act as the bedrock of confidential computing, allowing organizations to implement zero trust while fully utilizing the public cloud because they have the ability to verify the cloud environment. In the next chapter, you will learn about the Azure confidential computing (ACC) platform, including its various features and supporting services that enable organizations to run a variety of confidential workloads on Azure.
Get Azure Confidential Computing and Zero Trust now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.