Chapter 4. Protecting Your Data
The crown jewel in your cloud estate is the data that you store and process. As crafting and scaling a data security strategy is at least one book in and of itself, in this chapter you’ll be seeing how to implement the fundamental building blocks on which such a strategy would rest. In Chapter 1, you learned that you are only as strong as your weakest link. By embedding these recipes into the infrastructure fabric of your cloud estate, you can ensure that insufficient data encryption is not what will cause a data breach.
When talking of data, it can be in one of two states: at rest or in transit. The first nine recipes show how you should handle data in both states. The last three recipes are about data loss prevention, how you find where your valuable data is, and how you verify it has the level of warranted protection. When you have a cloud estate actively leveraged by hundreds of teams and thousands of engineers, it is beyond the scope of any one unaided human to be able to stay on top of what data is where. Instead, you’ll see how to use tooling as the needed force multiplier to manage data at scale.
So let’s dive into the wonderful world of encrypting your data in the cloud.
4.1 Encrypting Data at Rest on GCP
Solution
If you haven’t already done so, familiarize yourself with Terraform and the different authentication mechanisms in Chapter 11.
Create a variables.tf file and copy the following contents:
variable
"project_id"
{
type
=
string
description
=
"The project to create the resources in"
}
variable
"region"
{
type
=
string
description
=
"The region to create the resources in"
}
Then fill out the corresponding terraform.tfvars file:
project_id = "" region = ""
Create the following provider.tf file and run terraform init
:
provider
"google"
{
project
=
var
.
project_id
region
=
var
.
region
}
provider
"google-beta"
{
project
=
var
.
project_id
region
=
var
.
region
}
terraform
{
required_providers
{
=
{
source
=
"hashicorp/google"
version
=
"~> 3"
}
google-beta
=
{
source
=
"hashicorp/google-beta"
version
=
"~> 3"
}
null
=
{
source
=
"hashicorp/null"
version
=
"~> 3"
}
}
}
Create the following main.tf file:
resource
"google_project_service" "cloud_kms"
{
service
=
"cloudkms.googleapis.com"
}
resource
"google_kms_key_ring" "keyring"
{
name
=
"sensitive-data-keyring"
location
=
var
.
region
depends_on
=
[
google_project_service
.
cloud_kms
]
}
resource
"google_kms_crypto_key" "key"
{
name
=
"sensitive-data-key"
key_ring
=
google_kms_key_ring
.
keyring
.
id
}
To create an encrypted Cloud Compute Disk, add the following resources to main.tf:
resource
"google_service_account" "sensitive"
{
account_id
=
"sensitive-data-service-account"
display_name
=
"Sensitive Data Handler"
}
resource
"google_service_account_iam_policy" "sensitive"
{
service_account_id
=
google_service_account
.
sensitive
.
name
policy_data
=
data
.
google_iam_policy
.
sa
.
policy_data
}
data
"google_client_openid_userinfo" "me"
{}
data
"google_iam_policy" "sa"
{
binding
{
role
=
"roles/iam.serviceAccountUser"
members
=
[
"user:${data.google_client_openid_userinfo.me.email}"
,
]
}
}
resource
"google_kms_crypto_key_iam_member" "service_account_use"
{
crypto_key_id
=
google_kms_crypto_key
.
key
.
id
role
=
"roles/cloudkms.cryptoKeyEncrypterDecrypter"
member
=
"serviceAccount:${google_service_account.sensitive.email}"
}
resource
"google_compute_disk" "encrypted"
{
name
=
"encrypted"
size
=
"10"
type
=
"pd-standard"
zone
=
"${var.region}-a"
disk_encryption_key
{
kms_key_self_link
=
google_kms_crypto_key
.
key
.
id
kms_key_service_account
=
google_service_account
.
sensitive
.
}
depends_on
=
[
google_kms_crypto_key_iam_member
.
service_account_use
]
}
To create an encrypted Cloud SQL database, add the following resources to main.tf:
resource
"google_compute_network" "vpc_network"
{
name
=
"vpc-network"
}
resource
"google_sql_database_instance" "encrypted"
{
provider
=
google-beta
name
=
"encrypted-instance"
database_version
=
"POSTGRES_13"
region
=
var
.
region
deletion_protection
=
false
encryption_key_name
=
google_kms_crypto_key
.
key
.
id
settings
{
tier
=
"db-f1-micro"
ip_configuration
{
private_network
=
google_compute_network
.
vpc_network
.
id
}
}
depends_on
=
[
google_kms_crypto_key_iam_member
.
sql_binding
]
}
resource
"google_kms_crypto_key_iam_member" "sql_binding"
{
crypto_key_id
=
google_kms_crypto_key
.
key
.
id
role
=
"roles/cloudkms.cryptoKeyEncrypterDecrypter"
member
=
join
(
""
,
[
"serviceAccount:service-"
,
data
.
google_project
.
current
.
number
,
"@gcp-sa-cloud-sql.iam.gserviceaccount.com"
])
depends_on
=
[
null_resource
.
create_database_service_account
]
}
resource
"null_resource" "create_database_service_account"
{
provisioner
"local-exec"
{
command
=
join
(
" "
,
[
"gcloud beta services identity create"
,
"--project
=
${
var
.
project_id
}
"
,
"--service
=
sqladmin
.
googleapis
.
com"
])
}
}
To create an encrypted Cloud Storage bucket, add the following resources to main.tf:
data
"google_project" "current"
{}
resource
"google_storage_bucket" "encrypted"
{
name
=
"${data.google_project.current.project_id}-encrypted"
force_destroy
=
true
location
=
var
.
region
encryption
{
default_kms_key_name
=
google_kms_crypto_key
.
key
.
id
}
depends_on
=
[
google_kms_crypto_key_iam_member
.
storage_binding
]
}
data
"google_storage_project_service_account" "this"
{}
resource
"google_kms_crypto_key_iam_member" "storage_binding"
{
crypto_key_id
=
google_kms_crypto_key
.
key
.
id
role
=
"roles/cloudkms.cryptoKeyEncrypterDecrypter"
member
=
join
(
""
,[
"serviceAccount:"
,
data
.
google_storage_project_service_account
.
this
.
email_address
])
}
Review the resources to be created by running terraform plan
, and then run terraform apply
to make the changes.
Discussion
In this recipe, you saw how to deploy the following resources with the data encrypted at rest:
-
Cloud compute volumes
-
Cloud SQL databases
-
Cloud storage buckets
Warning
This recipe used a single, shared Cloud KMS key to secure all the resources deployed. In a normal scenario, you should be using multiple keys to enable finely grained access control and limit the blast radius of a breach.
As you can see in the recipe, in order to use a key with a particular resource, you need to give the correct service account the permissions to leverage the key. You will find that most resources will automatically leverage a default key for the project to encrypt resources when you do not specify what key to use. However, it is best to create distinct keys so you can segment resources within a project, allowing you to maintain the principle of least privilege.
When you create a key, the principal you use retains editor writes so you can continue to administer the key. Additionally, all keys belong to a key ring, which is a logical group that allows for managing groups of keys simultaneously, which has a distinct set of permissions compared to the keys. For example, you can allow someone to create new keys that they manage without giving them access to manage preexisting keys in the ring.
When defining IAM permissions for the keys in Terraform, you are given the following three types of resources:
google_kms_crypto_key_iam_policy
-
The policy resource allows for the authoritative setting of the IAM access to the key; by applying this resource all other preexisting assignments are removed. For example, use this resource if you wish the policies to be the only ones for the key.
google_kms_crypto_key_iam_binding
-
The binding resource allows for the authoritative setting of IAM access to a key for a particular role. All other assignments for a given role are removed. For example, use this resource to give
cryptoKeyEncrypterDecrypter
access to only the defined members. google_kms_crypto_key_iam_member
-
The member resource allows for nonauthoritative setting of permissions on the key. In the recipe, you used this resource to allow for setting the required permissions individually for each use case, without overwriting the previous access.
Between the three resource types, you saw three different methods for using service accounts with KMS keys.
To create the Compute Disk, you needed to create a bespoke service account and assign that when creating the resources.
To create the Cloud SQL database, you needed to use a null_resource
to invoke gcloud beta
to create a specific service account not natively supported by Terraform.
To create the Cloud Storage bucket, there is a Terraform data provider that gives you the details of the project-specific Service Account you need to use.
To know what service account you need for each resource type, refer to the service documentation. However, by using Recipe 6.1, you make it possible to produce reusable artifacts that make it simple for every team to enable encryption at rest.
Summary
Let’s summarize what was learned and deployed in this recipe:
-
On GCP, Cloud KMS gives you the ability to encrypt resources at rest.
-
Although GCP will encrypt many resources by default with an automatically generated key, you should look to create distinct keys that you can control.
-
Keys are created under a key ring, which allows for managing keys by logical groupings.
-
In order for a resource to leverage a key, the
cloudkms.cryptoKeyEncrypterDecrypter
role needs to be given to the appropriate service account. -
Depending on the resource type, what service account to use and how to create it varies.
-
You saw examples of how to encrypt Compute Engine disks, Cloud SQL databases, and Cloud Storage buckets, which all require different approaches for having the correct service account.
4.2 Encrypting Data at Rest on AWS
Solution
If you haven’t already done so, familiarize yourself with Terraform and the different authentication mechanisms in Chapter 11.
Create a variables.tf file and copy the following contents:
variable
"key_administrator_role"
{
type
=
string
description
=
"The role used to administer the key"
}
variable
"database_subnets_ids"
{
type
=
string
description
=
"The IDs of the subnets to host the database"
}
Then fill out the corresponding terraform.tfvars file:
key_administrator_role = "" database_subnets_ids = [""]
Create the following provider.tf file and run terraform init
:
provider
"aws"
{}
terraform
{
required_providers
{
aws
=
{
source
=
"hashicorp/aws"
version
=
"~> 3"
}
random
=
{
source
=
"hashicorp/random"
version
=
"~> 3"
}
}
}
Create the following main.tf file:
resource
"aws_kms_key" "key"
{
policy
=
data
.
aws_iam_policy_document
.
key_policy
.
json
}
data
"aws_iam_policy_document" "key_policy"
{
statement
{
sid
=
"Allow access for Key Administrators"
actions
=
[
"kms:Create*"
,
"kms:Describe*"
,
"kms:Enable*"
,
"kms:List*"
,
"kms:Put*"
,
"kms:Update*"
,
"kms:Revoke*"
,
"kms:Disable*"
,
"kms:Get*"
,
"kms:Delete*"
,
"kms:TagResource"
,
"kms:UntagResource"
,
"kms:ScheduleKeyDeletion"
,
"kms:CancelKeyDeletion"
]
effect
=
"Allow"
principals
{
type
=
"AWS"
identifiers
=
[
var
.
key_administrator_role
]
}
resources
=
[
"*"
]
}
statement
{
sid
=
"Allow use of the key"
actions
=
[
"kms:Encrypt"
,
"kms:Decrypt"
,
"kms:ReEncrypt*"
,
"kms:GenerateDataKey*"
,
"kms:DescribeKey"
]
effect
=
"Allow"
principals
{
type
=
"AWS"
identifiers
=
[
"*"
]
}
resources
=
[
"*"
]
}
statement
{
sid
=
"Allow attachment of persistent resources"
effect
=
"Allow"
principals
{
type
=
"AWS"
identifiers
=
[
"*"
]
}
actions
=
[
"kms:CreateGrant"
,
"kms:ListGrants"
,
"kms:RevokeGrant"
]
resources
=
[
"*"
]
condition
{
test
=
"Bool"
variable
=
"kms:GrantIsForAWSResource"
values
=
[
true
]
}
}
}
To create an encrypted Elastic Block Store (EBS) volume, enable EBS encryption by default, and set the default EBS encryption key by adding the following resources to main.tf:
data
"aws_region" "current"
{}
resource
"aws_ebs_default_kms_key" "this"
{
key_arn
=
aws_kms_key
.
key
.
arn
}
resource
"aws_ebs_encryption_by_default" "this"
{
enabled
=
true
depends_on
=
[
aws_ebs_default_kms_key
.
this
]
}
resource
"aws_ebs_volume" "this"
{
availability_zone
=
"${data.aws_region.current.name}a"
size
=
1
type
=
"gp3"
depends_on
=
[
aws_ebs_encryption_by_default
.
this
]
}
To create an encrypted RDS database, add the following resources to main.tf:
resource
"random_password" "database"
{
length
=
16
special
=
true
override_special
=
"_%@"
}
resource
"aws_db_instance" "default"
{
allocated_storage
=
10
db_subnet_group_name
=
aws_db_subnet_group
.
default
.
name
engine
=
"postgres"
engine_version
=
"13.2"
instance_class
=
"db.t3.micro"
name
=
"encrypteddatabase"
kms_key_id
=
aws_kms_key
.
this
.
arn
username
=
"postgres"
password
=
random_password
.
database
.
result
parameter_group_name
=
"default.postgres13"
skip_final_snapshot
=
true
storage_encrypted
=
true
vpc_security_group_ids
=
[
aws_security_group
.
database
.
id
]
}
resource
"aws_db_subnet_group" "default"
{
subnet_ids
=
var
.
database_subnet_ids
}
resource
"aws_security_group" "database"
{
vpc_id
=
var
.
vpc_id
}
output
"database_password"
{
value
=
aws_db_instance
.
default
.
password
}
To create an encrypted S3 bucket, add the following resources to main.tf:
resource
"aws_s3_bucket" "encrypted_bucket"
{
server_side_encryption_configuration
{
rule
{
apply_server_side_encryption_by_default
{
kms_master_key_id
=
aws_kms_key
.
this
.
arn
sse_algorithm
=
"aws:kms"
}
}
}
}
Review the resources to be created by running terraform plan
, and then run terraform apply
to make the changes.
Discussion
In this recipe, you saw how to deploy the following resources with the data encrypted at rest:
-
EBS volumes
-
RDS databases
-
S3 buckets
Warning
This recipe used a single, shared AWS KMS key to secure all the resources deployed. In a normal scenario, you should be using multiple keys to enable finely grained access control and limit the blast radius of a breach.
The service that lives at the core of AWS encryption is AWS Key Management Service (KMS). For many services, AWS will provision a KMS key for that service that you can leverage for encryption, known as an AWS customer managed key (CMK). However, although this potentially simplifies things, it now gives a single point of failure or compromise shared across many resources.
Instead, you should create your own KMS keys, known as customer-managed keys, and apply them explicitly to resources. By doing this, you have the ability to be explicit about who can use what key where, and control potential privilege escalation.
Three different kinds of KMS policies
When looking at KMS policies, there are generally three kinds of users who need to be able to interact with a key:
-
Administrators, allowing them to control usage of the key but not use the key
-
Users, allowing them to use the key but not change how it can be used
-
AWS services, allowing them to temporarily leverage the key as required
Setting up the policy to enable administrators and users is relatively trivial. However, to set up access that AWS services can leverage, you need the permission to create grants. Grants are a way to give temporary permission to AWS principals to use a CMK. They only allow the principal to use the minimum subset of required KMS operations: encrypting and decrypting data, creating grants, and retiring or revoking grants.
Encrypting data at rest on EBS
With EBS, you need to ensure that both volumes and snapshots are encrypted. When a volume is encrypted, any snapshots taken from it will be encrypted as well. If you have unencrypted volumes, you cannot then encrypt them on demand; it can only be done on creation. To move the data to encrypted storage, you need to take a snapshot and restore it, which can be done with the following Terraform:
data
"aws_ebs_volume" "unencrypted"
{
filters
{
name
=
"volume-id"
values
=
[
var
.
volume_id
]
}
}
resource
"aws_ebs_snapshot" "conversion"
{
volume_id
=
var
.
volume_id
}
resource
"aws_ebs_volume" "encrypted"
{
availability_zone
=
data
.
aws_ebs_volume
.
unencrypted
.
type
.
availability_zone
encrypted
=
true
snapshot_id
=
aws_ebs_snapshot
.
conversion
.
id
type
=
data
.
aws_ebs_volume
.
unencrypted
.
type
}
If you have unencrypted snapshots, you can copy them to encrypt them, like so:
data
"aws_region" "current"
{}
resource
"aws_ebs_snapshot_copy" "snapshot"
{
source_snapshot_id
=
var
.
snapshot_id
source_region
=
data
.
aws_region
.
current
.
name
encrypted
=
true
}
Encrypting data at rest on RDS
With databases on RDS, the rules for enabling encryption are similar to EBS. To encrypt an existing database, you need to take a snapshot and restore it, and to encrypt an existing snapshot, you need to copy it. The following two code snippets show how to perform each of these operations.
The following Terraform encrypts an existing database by creating and then restoring from a snapshot:
resource
"aws_db_snapshot" "conversion"
{
db_instance_identifier
=
var
.
database_id
db_snapshot_identifier
=
"encryption_conversion"
}
resource
"aws_db_instance" "default"
{
allocated_storage
=
10
db_subnet_group_name
=
aws_db_subnet_group
.
default
.
name
engine
=
"postgres"
engine_version
=
"13.2"
instance_class
=
"db.t3.micro"
name
=
"encrypteddatabase"
kms_key_id
=
aws_kms_key
.
rds
.
arn
username
=
var
.
username
password
=
var
.
password
parameter_group_name
=
"default.postgres13"
skip_final_snapshot
=
true
snapshot_identifier
=
aws_db_snapshot
.
conversion
.
id
storage_encrypted
=
true
vpc_security_group_ids
=
[
aws_security_group
.
database
.
id
]
}
The following Terraform uses a null resource to create an encrypted copy of an existing database snapshot:
resource
"null_resource" "create_encrypted_copy"
{
provisioner
"local-exec"
{
command
=
join
(
" "
,
[
"aws rds copy-db-snapshot"
,
"--source-db-snapshot-identifier ${var.snapshot_arn}"
,
"--target-db-snapshot-identifier encryptedsnapshot"
,
"--kms-key-id ${aws_kms_key.rds.arn}"
])
}
}
Encrypting data at rest on S3
With S3, there are four options for encryption:
-
AWS-managed CMKs
-
Customer-managed CMKs
-
S3-managed encryption keys
-
Customer-provided encryption keys
To continue with the theme of using customer-provided CMKs to retain control of how the keys are used, this recipe focused on the second option. To see the last option in action, see Recipe 4.5. Depending on your use case, you also need to look into how objects stored in your bucket are encrypted. Imagine a case where some highly sensitive objects need to be encrypted with a specific KMS key, not the bucket default. By using bucket policies, you can force users to conform to certain encryption standards. Let’s look at two examples, enforcing that KMS is used for the objects, and enforcing that a specific KMS key is used.
The following bucket policy enforces that a KMS key must be used:
data
"aws_iam_policy_document" "kms_enforcement"
{
statement
{
effect
=
"Deny"
actions
=
[
"s3:PutObject"
]
resources
=
[
"${aws_s3_bucket.kms_enforcement.arn}/*"
]
principals
{
type
=
"*"
identifiers
=
[
"*"
]
}
condition
{
test
=
"StringNotEquals"
values
=
[
"aws:kms"
]
variable
=
"s3:x-amz-server-side-encryption"
}
}
}
The following bucket policy enforces that a specific KMS key be used:
data
"aws_iam_policy_document" "specific_kms_enforcement"
{
statement
{
effect
=
"Deny"
actions
=
[
"s3:PutObject"
]
resources
=
[
"${aws_s3_bucket.specific_kms_enforcement.arn}/*"
]
principals
{
type
=
"*"
identifiers
=
[
"*"
]
}
condition
{
test
=
"StringNotEquals"
values
=
[
aws_kms_key
.
s
3
.
arn
]
variable
=
"s3:x-amz-server-side-encryption-aws-kms-key-id"
}
}
}
Summary
Let’s summarize what was learned and deployed in this recipe:
-
On AWS, your core service for encryption is KMS.
-
AWS provides AWS-managed KMS keys that are used for default encryption of resources.
-
KMS allows you to create customer-managed keys.
-
By creating customer-managed CMKs, you can explicitly control and audit usage.
-
Many resources on AWS need to be recreated to enable data-at-rest encryption.
-
Some resources allow you to set specific policies governing how users interact with them, allowing you to enforce encryption standards.
4.3 Encrypting Data at Rest on Azure
Solution
If you haven’t already done so, familiarize yourself with Terraform and the different authentication mechanisms in Chapter 11.
Create a variables.tf file and copy the following contents:
variable
"location"
{
type
=
string
description
=
"The location to deploy your resource into"
}
variable
"storage_account_name"
{
type
=
string
description
=
"The name of the storage account"
}
Then fill out the corresponding terraform.tfvars file:
location = "" storage_account_name = ""
Create the following provider.tf file and run terraform init
:
terraform
{
required_providers
{
azurerm
=
{
source
=
"hashicorp/azurerm"
version
=
"~> 2"
}
random
=
{
source
=
"hashicorp/random"
version
=
"~> 3"
}
}
}
provider
"azurerm"
{
features
{}
}
Create the following main.tf file:
resource
"random_string" "key_vault"
{
length
=
16
special
=
false
}
data
"azurerm_client_config" "current"
{}
resource
"azurerm_resource_group" "encrypted"
{
name
=
"encrypted"
location
=
var
.
location
}
resource
"azurerm_key_vault" "keys"
{
name
=
random_string
.
key_vault
.
result
location
=
azurerm_resource_group
.
encrypted_blobs
.
location
resource_group_name
=
azurerm_resource_group
.
encrypted_blobs
.
name
tenant_id
=
data
.
azurerm_client_config
.
current
.
tenant_id
enabled_for_disk_encryption
=
true
soft_delete_retention_days
=
7
purge_protection_enabled
=
true
sku_name
=
"standard"
}
resource
"azurerm_key_vault_key" "key"
{
name
=
"key"
key_vault_id
=
azurerm_key_vault
.
keys
.
id
key_type
=
"RSA"
key_size
=
2048
key_opts
=
[
"decrypt", "encrypt", "sign", "unwrapKey", "verify", "wrapKey"
]
}
resource
"azurerm_key_vault_access_policy" "client"
{
key_vault_id
=
azurerm_key_vault
.
keys
.
id
tenant_id
=
data
.
azurerm_client_config
.
current
.
tenant_id
object_id
=
data
.
azurerm_client_config
.
current
.
object_id
key_permissions
=
[
"get", "create", "delete"
]
secret_permissions
=
[
"get"
]
}
To create an encrypted managed disk, add the following resources to main.tf:
resource
"azurerm_disk_encryption_set" "des"
{
name
=
"des"
resource_group_name
=
azurerm_resource_group
.
encrypted_blobs
.
name
location
=
azurerm_resource_group
.
encrypted_blobs
.
location
key_vault_key_id
=
azurerm_key_vault_key
.
blob
.
id
identity
{
type
=
"SystemAssigned"
}
}
resource
"azurerm_key_vault_access_policy" "disk"
{
key_vault_id
=
azurerm_key_vault
.
keys
.
id
tenant_id
=
azurerm_disk_encryption_set
.
des
.
identity
.
0
.
tenant_id
object_id
=
azurerm_disk_encryption_set
.
des
.
identity
.
0
.
principal_id
key_permissions
=
[
"Get"
,
"WrapKey"
,
"UnwrapKey"
]
}
resource
"azurerm_managed_disk" "encrypted"
{
name
=
"encryption-test"
location
=
azurerm_resource_group
.
encrypted_blobs
.
location
resource_group_name
=
azurerm_resource_group
.
encrypted_blobs
.
name
storage_account_type
=
"Standard_LRS"
create_option
=
"Empty"
disk_size_gb
=
"1"
disk_encryption_set_id
=
azurerm_disk_encryption_set
.
des
.
id
}
To create an encrypted database, add the following resources to main.tf:
resource
"azurerm_postgresql_server" "database"
{
name
=
"encrypted-database"
location
=
azurerm_resource_group
.
encrypted_blobs
.
location
resource_group_name
=
azurerm_resource_group
.
encrypted_blobs
.
name
administrator_login
=
"postgres"
administrator_login_password
=
random_password
.
database
.
result
sku_name
=
"GP_Gen5_2"
version
=
"11"
storage_mb
=
5120
ssl_enforcement_enabled
=
true
threat_detection_policy
{
disabled_alerts
=
[]
email_account_admins
=
false
email_addresses
=
[]
enabled
=
true
retention_days
=
0
}
identity
{
type
=
"SystemAssigned"
}
}
To create an encrypted storage account, add the following resources to main.tf:
resource
"azurerm_key_vault_access_policy" "storage"
{
key_vault_id
=
azurerm_key_vault
.
keys
.
id
tenant_id
=
data
.
azurerm_client_config
.
current
.
tenant_id
object_id
=
azurerm_storage_account
.
sensitive
.
identity
.
0
.
principal_id
key_permissions
=
[
"get", "unwrapkey", "wrapkey"
]
secret_permissions
=
[
"get"
]
}
resource
"azurerm_storage_account" "sensitive"
{
name
=
var
.
storage_account_name
resource_group_name
=
azurerm_resource_group
.
encrypted_blobs
.
name
location
=
azurerm_resource_group
.
encrypted_blobs
.
location
account_tier
=
"Standard"
account_replication_type
=
"LRS"
identity
{
type
=
"SystemAssigned"
}
}
resource
"azurerm_storage_account_customer_managed_key" "sensitive"
{
storage_account_id
=
azurerm_storage_account
.
sensitive
.
id
key_vault_id
=
azurerm_key_vault
.
keys
.
id
key_name
=
azurerm_key_vault_key
.
blob
.
name
}
Review the resources to be created by running terraform plan
, and then run terraform apply
to make the changes.
Discussion
In this recipe, you saw how to deploy the following resources with the data encrypted at rest:
-
Managed disks
-
PostgreSQL databases
-
Storage accounts
Warning
This recipe used a single, shared key within the Key Vault to secure all the resources deployed. In a normal scenario, you should be using multiple keys to enable finely grained access control and limit the blast radius of a breach.
This brings us to the topic of how IAM is applied in the context of Key Vaults in Azure. In this recipe, you defined multiple different access policies which enabled
the specific required usage of the different principals. First was the 0_key_vault_access_policy client
resource, ensuring that the principal who created the vault retained the ability to create and delete keys as required.
Then as you created the workload resources, each time in turn you needed to apply a distinct access policy to allow the managed identity to perform the required operations with the key. In all three cases, the identity can only perform the get
, unwrapkey
, and wrapkey
operations, the minimum set of permissions required. As a further step, you could extend the recipe to not leverage SystemAssigned
identities, instead defining your own to further segment who can leverage what keys. Rather than having a shared system identity between resources, by them having distinct identities with access to different keys, you can handle different levels of data sensitivity.
It’s one thing to enable encryption on the resources that you own and control. The next step is understanding where other people are not conforming to the same approach. How can you know when people are not correctly using CMKs? For that, you need to turn to Azure Policy. Let’s look at how you can apply policies to subscriptions that hold sensitive data and identify where CMKs have not been used for the three resources looked at in this recipe.
You can extend this recipe with the following variable:
variable
"sensitive_subscription_ids"
{
type
=
list
(
string
)
description
=
"The IDs of the sensitive data subscriptions"
}
And then add the following data provider and resources to apply the policy to the selected subscriptions:
data
"azurerm_subscription" "subscription"
{
for_each
=
toset
(
var
.
sensitive_subscription_ids
)
subscription_id
=
each
.
value
}
resource
"azurerm_policy_assignment" "storage_cmk"
{
for_each
=
toset
(
var
.
sensitive_subscription_ids
)
name
=
"storage-cmk-${each.value}"
scope
=
data
.
azurerm_subscription
.
subscription
[
each
.
value
].
id
policy_definition_id
=
join
(
""
,
[
"/providers/Microsoft.Authorization/policyDefinitions/"
,
"b5ec538c-daa0-4006-8596-35468b9148e8"
])
}
resource
"azurerm_policy_assignment" "postgres_cmk"
{
for_each
=
toset
(
var
.
sensitive_subscription_ids
)
name
=
"postgres-cmk-${each.value}"
scope
=
data
.
azurerm_subscription
.
subscription
[
each
.
value
].
id
policy_definition_id
=
join
(
""
,
[
"/providers/Microsoft.Authorization/policyDefinitions/"
,
"18adea5e-f416-4d0f-8aa8-d24321e3e274"
])
}
resource
"azurerm_policy_assignment" "disk_cmk"
{
for_each
=
toset
(
var
.
sensitive_subscription_ids
)
name
=
"disk-cmk-${each.value}"
scope
=
data
.
azurerm_subscription
.
subscription
[
each
.
value
].
id
policy_definition_id
=
join
(
""
,
[
"/providers/Microsoft.Authorization/policyDefinitions/"
,
"702dd420-7fcc-42c5-afe8-4026edd20fe0"
])
}
Summary
Let’s summarize what was learned and deployed in this recipe:
-
On Azure, for key management, you create keys within Key Vaults.
-
By applying access policies to your vaults, you can control who has access to keys.
-
You should have separate identities for managing keys and using keys.
-
Giving access to resources to leverage customer-managed keys involves giving their identity access.
-
You can either give access to the
SystemAssigned
identity or create and manage identities yourself. -
You created an encrypted storage account, PostgreSQL database, and disk.
-
By assigning Azure Policies to subscriptions, you can detect where people are not leveraging CMKs when required.
4.4 Encrypting Data on GCP with Your Own Keys
Solution
If you haven’t already done so, familiarize yourself with Terraform and the different authentication mechanisms in Chapter 11.
Create a variables.tf file and copy the following contents:
variable
"project_id"
{
type
=
string
description
=
"The project to create the resources in"
}
variable
"region"
{
type
=
string
description
=
"The region to create the resources in"
}
Then fill out the corresponding terraform.tfvars file:
project_id = "" region = ""
Create the following provider.tf file and run terraform init
:
provider
"google"
{
project
=
var
.
project_id
region
=
var
.
region
}
terraform
{
required_providers
{
=
{
source
=
"hashicorp/google"
version
=
"~> 3"
}
}
}
Create the following main.tf file and run terraform plan
:
data
"google_project" "current"
{}
resource
"google_storage_bucket" "csek"
{
name
=
"${data.google_project.current.project_id}-csek"
force_destroy
=
true
location
=
var
.
region
}
output
"storage_bucket_name"
{
value
=
google_storage_bucket
.
csek
.
name
}
Review the resources that are going to be created, and then run terraform apply
to make the changes.
Install the pycryptodomex
and google-cloud-storage
libraries by running pip install pycryptodomex google-cloud-storage
.
Note
This code is just demonstrative for the recipe. You should use specialized software for the creation and management of your keys.
Create the following generate_data_key.py file and run python gener
ate_
data_key.py
to create a local key file.
import
base64
from
Cryptodome.Random
import
get_random_bytes
key
=
get_random_bytes
(
32
)
(
key
)
with
open
(
"key"
,
"w"
)
as
file
:
file
.
write
(
str
(
base64
.
b64encode
(
key
),
"utf-8"
))
Copy a file you wish to store encrypted into your working directory.
Create the following upload_file.py file:
import
base64
import
sys
from
subprocess
import
run
from
google.cloud
import
storage
def
upload
(
file_name
):
storage_client
=
storage
.
Client
()
bucket_name
=
(
run
(
"terraform output storage_bucket_name"
,
capture_output
=
True
,
check
=
True
,
shell
=
True
,
)
.
stdout
.
decode
(
"utf-8"
)
.
split
(
'"'
)[
1
]
)
bucket
=
storage_client
.
bucket
(
bucket_name
)
with
open
(
"key"
,
"r"
)
as
file
:
encryption_key
=
base64
.
b64decode
(
file
.
read
())
blob
=
bucket
.
blob
(
file_name
,
encryption_key
=
encryption_key
)
blob
.
upload_from_filename
(
file_name
)
if
__name__
==
"__main__"
:
upload
(
sys
.
argv
[
1
])
To upload your file to the Cloud Storage bucket, run python upload_file.py
with the name of your file. For example, run python upload_file.py message.txt
.
Discussion
The following Python download_file.py file will download your file to your local directory:
import
base64
import
sys
from
subprocess
import
run
from
google.cloud
import
storage
def
download
(
file_key
,
file_name
):
storage_client
=
storage
.
Client
()
bucket_name
=
(
run
(
"terraform output storage_bucket_name"
,
capture_output
=
True
,
check
=
True
,
shell
=
True
,
)
.
stdout
.
decode
(
"utf-8"
)
.
split
(
'"'
)[
1
]
)
bucket
=
storage_client
.
bucket
(
bucket_name
)
with
open
(
"key"
,
"r"
)
as
file
:
encryption_key
=
base64
.
b64decode
(
file
.
read
())
blob
=
bucket
.
blob
(
file_key
,
encryption_key
=
encryption_key
)
blob
.
download_to_filename
(
file_name
)
if
__name__
==
"__main__"
:
download
(
sys
.
argv
[
1
],
sys
.
argv
[
2
])
To execute the code, run python download_file.py
with the name of the file you uploaded, and the filename to use for the copy. For example, run python download_file.py message.txt message_copy.txt
.
Note
Files stored with this encryption mechanism cannot be uploaded or downloaded through the console.
Managing your own keys quickly becomes a laborious practice. The burden of rotating, securing, and providing access to keys is something that should only be shouldered when explicitly required. For the vast majority of use cases, Cloud KMS should suffice, or potentially use Cloud Hardware Security Module (HSM), which allows you to leverage fully managed FIPS 140-2 Level 3 certified HSMs.
If it is required that key material exist outside GCP, then use the Cloud External Key Manager (Cloud EKM). This allows you to leverage third-party key management services from vendors such as the following:
-
Fortanix
-
Ionic
-
Thales
-
Equinix SmartKey
-
Unbound Tech
This offering is only supported on a small subset of services, including the following:
-
Compute Engine
-
Secrets Manager
-
Cloud SQL
Summary
Let’s summarize what was learned and deployed in this recipe:
-
On GCP you can create and use your own encryption keys, known as customer-supplied encryption keys, or CSEKs.
-
They can be used with Cloud Storage to encrypt objects, so only those who hold the key can decrypt them; even GCP cannot do so.
-
You wrote Python for generating a key, and then using the key to upload and download files securely.
-
CSEKs should only be used when absolutely required, as the maintenance burden is high.
-
Other options other than Cloud KMS for encrypting data include Cloud HSM and Cloud EKM.
4.5 Encrypting Data on AWS with Your Own Keys
Solution
If you haven’t already done so, familiarize yourself with Terraform and the different authentication mechanisms in Chapter 11.
Create the following provider.tf file and run terraform init
:
provider
"aws"
{}
terraform
{
required_providers
{
aws
=
{
source
=
"hashicorp/aws"
version
=
"~> 3"
}
}
}
Create the following main.tf file and run terraform plan
:
resource
"aws_s3_bucket" "bucket"
{}
output
"bucket_name"
{
value
=
aws_s
3
_bucket
.
bucket
.
bucket
}
Review the resources that are going to be created, and then run terraform apply
to make the changes.
Install the pycryptodomex
and boto3
libraries by running pip install pycryptodomex boto3
.
Create the following generate_data_key.py file and run python gener
ate_
data_key.py
to create a local key file.
Note
This code is just demonstrative for the recipe. You should use specialized software for the creation and management of your keys.
import
base64
from
Cryptodome.Random
import
get_random_bytes
key
=
get_random_bytes
(
32
)
(
key
)
with
open
(
"key"
,
"w"
)
as
file
:
file
.
write
(
str
(
base64
.
b64encode
(
key
),
"utf-8"
))
Copy a file you wish to store encrypted into your working directory.
Create the following put_object.py file:
import
base64
import
subprocess
import
sys
import
boto3
filename
=
sys
.
argv
[
1
]
bucket_name
=
(
subprocess
.
run
(
"terraform output bucket_name"
,
shell
=
True
,
check
=
True
,
capture_output
=
True
,
)
.
stdout
.
decode
(
"utf-8"
)
.
split
(
'"'
)[
1
]
)
with
open
(
"key"
,
"r"
)
as
file
:
key
=
base64
.
b64decode
(
file
.
read
())
s3
=
boto3
.
client
(
"s3"
)
with
open
(
filename
,
"r"
)
as
file
:
s3
.
put_object
(
Body
=
file
.
read
(),
Bucket
=
bucket_name
,
Key
=
filename
,
SSECustomerAlgorithm
=
"AES256"
,
SSECustomerKey
=
key
,
)
To upload your file to the S3 bucket, run python put_object.py
with the name of your file. For example, run python put_object.py message.txt
.
Discussion
The following Python get_object.py file will download your file to your local directory:
import
base64
import
subprocess
import
sys
import
boto3
filename
=
sys
.
argv
[
1
]
with
open
(
"key"
,
"r"
)
as
file
:
key
=
base64
.
b64decode
(
file
.
read
())
s3
=
boto3
.
client
(
"s3"
)
bucket_name
=
(
subprocess
.
run
(
"terraform output bucket_name"
,
shell
=
True
,
check
=
True
,
capture_output
=
True
,
)
.
stdout
.
decode
(
"utf-8"
)
.
split
(
'"'
)[
1
]
)
(
s3
.
get_object
(
Bucket
=
bucket_name
,
Key
=
filename
,
SSECustomerAlgorithm
=
"AES256"
,
SSECustomerKey
=
key
,
)[
"Body"
]
.
read
()
.
decode
()
)
To execute the code, run python get_object.py
with the name of the file you uploaded, and the filename to use for the copy. For example, run python get_object.py message.txt message_copy.txt
.
Note
Objects stored with this encryption mechanism cannot be uploaded or downloaded through the console.
Customer-supplied encryption keys should only be used when it is necessary that the key material be created, managed, and owned outside of AWS. Where possible, you should look to leverage AWS KMS to create and manage keys. By creating them yourself, you take on a much larger burden of responsibility. The processes of protecting, serving, and rotating keys all become areas where you need to invest significant time.
In “Encrypting data at rest on S3”, you saw a bucket policy that enforced that consumers use a KMS key to encrypt their objects. Following is a Terraform data provider snippet that configures a similar policy that ensures that users encrypt objects with an AES256 key:
data
"aws_iam_policy_document" "kms_enforcement"
{
statement
{
effect
=
"Deny"
actions
=
[
"s3:PutObject"
]
resources
=
[
"${aws_s3_bucket.kms_enforcement.arn}/*"
]
principals
{
type
=
"*"
identifiers
=
[
"*"
]
}
condition
{
test
=
"StringNotEquals"
values
=
[
"AES256"
]
variable
=
"s3:x-amz-server-side-encryption"
}
}
}
Because AWS does not store any information related to the customer-supplied key, there is no policy that allows you to enforce that a specific key is used, as there is with KMS managed keys.
Summary
Let’s summarize what was learned and deployed in this recipe:
-
On AWS, you can supply your own encryption keys to store objects in S3, known as customer-supplied keys.
-
As the keys are not stored on AWS, you prevent anyone without direct access to the keys from accessing the objects.
-
By adopting this technique, you shoulder the large burden of key rotation, access, and creation.
-
In order to use a customer-supplied key, you will need to provide it for both storing and retrieving objects.
-
It is possible to enforce the use of encryption keys with bucket policies.
4.6 Encrypting Data on Azure with Your Own Keys
Solution
If you haven’t already done so, familiarize yourself with Terraform and the different authentication mechanisms in Chapter 11.
Create a variables.tf file and copy the following contents:
variable
"location"
{
type
=
string
description
=
"The location to deploy your resource into"
}
variable
"storage_account_name"
{
type
=
string
description
=
"The name of the storage account"
}
Then fill out the corresponding terraform.tfvars file:
location = "" storage_account_name = ""
Create the following provider.tf file, and run terraform init
:
terraform
{
required_providers
{
azurerm
=
{
source
=
"hashicorp/azurerm"
version
=
"~> 2"
}
}
}
provider
"azurerm"
{
features
{}
}
Create the following main.tf file and run terraform plan
:
resource
"azurerm_resource_group" "csks"
{
name
=
"csks"
location
=
var
.
location
}
resource
"azurerm_storage_account" "csk"
{
name
=
var
.
storage_account_name
resource_group_name
=
azurerm_resource_group
.
csks
.
name
location
=
azurerm_resource_group
.
csks
.
location
account_tier
=
"Standard"
account_replication_type
=
"LRS"
}
resource
"azurerm_storage_container" "csk"
{
name
=
"csk"
storage_account_name
=
azurerm_storage_account
.
csk
.
name
container_access_type
=
"private"
}
output
"connection_string"
{
value
=
azurerm_storage_account
.
csk
.
primary_connection_string
sensitive
=
true
}
output
"container_name"
{
value
=
azurerm_storage_container
.
csk
.
name
}
Review the resources that are going to be created, and then run terraform apply
to make the changes.
Install the pycryptodomex
, azure-storage-blob
, and azure-identity
libraries by running pip install pycryptodomex azure-storage-blob azure-identity
.
Note
This code is just demonstrative for the recipe. You should use specialized software for the creation and management of your keys.
Create the following generate_data_key.py file and run python gener
ate_
data_key.py
to create a local key file.
import
base64
from
Cryptodome.Random
import
get_random_bytes
key
=
get_random_bytes
(
32
)
(
key
)
with
open
(
"key"
,
"w"
)
as
file
:
file
.
write
(
str
(
base64
.
b64encode
(
key
),
"utf-8"
))
Copy a file you wish to store encrypted into your working directory.
Create the following upload_blob.py file:
import
base64
import
sys
from
hashlib
import
sha256
from
subprocess
import
run
from
azure.identity
import
AzureCliCredential
from
azure.storage.blob
import
BlobClient
,
CustomerProvidedEncryptionKey
conn_str
=
(
run
(
"terraform output connection_string"
,
shell
=
True
,
check
=
True
,
capture_output
=
True
,
)
.
stdout
.
decode
(
"utf-8"
)
.
split
(
'"'
)[
1
]
)
container_name
=
(
run
(
"terraform output container_name"
,
shell
=
True
,
check
=
True
,
capture_output
=
True
,
)
.
stdout
.
decode
(
"utf-8"
)
.
split
(
'"'
)[
1
]
)
credential
=
AzureCliCredential
()
blob
=
BlobClient
.
from_connection_string
(
conn_str
=
conn_str
,
container_name
=
container_name
,
blob_name
=
sys
.
argv
[
1
]
)
with
open
(
"key"
,
"r"
)
as
file
:
key
=
file
.
read
()
hash
=
sha256
(
base64
.
b64decode
(
key
))
with
open
(
sys
.
argv
[
1
],
"rb"
)
as
file
:
blob
.
upload_blob
(
file
,
cpk
=
CustomerProvidedEncryptionKey
(
key
,
str
(
base64
.
b64encode
(
hash
.
digest
()),
"utf-8"
)
),
)
To upload your file to the S3 bucket, run python upload_blob.py
with the name of your file. For example, run python upload_blob.py message.txt
.
Discussion
The following Python download_blob.py file will download your file to your local directory:
import
base64
import
sys
from
hashlib
import
sha256
from
subprocess
import
run
from
azure.identity
import
AzureCliCredential
from
azure.storage.blob
import
BlobClient
,
CustomerProvidedEncryptionKey
conn_str
=
(
run
(
"terraform output connection_string"
,
shell
=
True
,
check
=
True
,
capture_output
=
True
,
)
.
stdout
.
decode
(
"utf-8"
)
.
split
(
'"'
)[
1
]
)
container_name
=
(
run
(
"terraform output container_name"
,
shell
=
True
,
check
=
True
,
capture_output
=
True
,
)
.
stdout
.
decode
(
"utf-8"
)
.
split
(
'"'
)[
1
]
)
credential
=
AzureCliCredential
()
blob
=
BlobClient
.
from_connection_string
(
conn_str
=
conn_str
,
container_name
=
container_name
,
blob_name
=
sys
.
argv
[
1
]
)
with
open
(
"key"
,
"r"
)
as
file
:
key
=
file
.
read
()
hash
=
sha256
(
base64
.
b64decode
(
key
))
with
open
(
f
"{sys.argv[1]}_copy"
,
"wb"
)
as
file
:
data
=
blob
.
download_blob
(
cpk
=
CustomerProvidedEncryptionKey
(
key
,
str
(
base64
.
b64encode
(
hash
.
digest
()),
"utf-8"
)
)
)
data
.
readinto
(
file
)
To execute the code, run python download_blog.py
with the name of the file you uploaded and the filename to use for the copy. For example, run python download_blog.py message.txt message_copy.txt
.
Note
Blobs stored with this encryption mechanism cannot be uploaded or downloaded through the console.
This recipe is needed due to internal requirements at some businesses. However, if you do not explicitly need to use keys created and managed outside of Azure, you should look to leverage customer-managed keys wherever possible. In using customer-supplied keys, as in this recipe, you take on the nontrivial burden of key management, security, rotation, and provisioning.
Another option is uploading your externally created keys to Azure, so you can leverage them through the normal Azure APIs the same way you would a customer-managed key. That allows you to use your own keys with services outside of storage, as customer-supplied keys as shown in this recipe cannot be used with the majority of services.
Summary
Let’s summarize what was learned and deployed in this recipe:
-
On Azure, you can use what are known as customer-supplied keys for encrypting data at rest.
-
The keys are securely discarded when used through API calls and are never persisted in Azure.
-
The main service that can use these keys is storage.
-
By using customer-supplied keys, you accept a large burden of responsibility and effort.
-
You should only use this approach when it is explicitly required.
4.7 Enforcing In-Transit Data Encryption on GCP
Solution
If you haven’t already done so, familiarize yourself with Terraform and the different authentication mechanisms in Chapter 11.
Create a variables.tf file and copy the following contents:
variable
"project_id"
{
type
=
string
description
=
"The project to create the resources in"
}
variable
"region"
{
type
=
string
description
=
"The region to create the resources in"
}
variable
"organization_domain"
{
type
=
string
description
=
"The organization domain of your Google Cloud estate"
}
Then fill out the corresponding terraform.tfvars file:
project_id = "" region = "" organization_domain = ""
Create the following provider.tf file and run terraform init
:
provider
"google"
{
project
=
var
.
project_id
region
=
var
.
region
}
terraform
{
required_providers
{
=
{
source
=
"hashicorp/google"
version
=
"~> 3"
}
null
=
{
source
=
"hashicorp/null"
version
=
"~> 3"
}
}
}
Create the following main.tf file and run terraform plan
:
data
"google_organization" "current"
{
domain
=
var
.
organization_domain
}
data
"google_project" "current"
{}
resource
"google_project_service" "cloud_asset"
{
service
=
"cloudasset.googleapis.com"
}
resource
"null_resource" "cloudasset_service_account"
{
provisioner
"local-exec"
{
command
=
join
(
" "
,
[
"gcloud beta services identity create"
,
"--service
=
cloudasset
.
googleapis
.
com"
,
"--project
=
${
var
.
project_id
}
"
])
}
depends_on
=
[
google_project_service
.
cloud_asset
]
}
resource
"google_bigquery_dataset" "assets"
{
dataset_id
=
"assets"
delete_contents_on_destroy
=
true
}
resource
"google_project_iam_member" "asset_sa_editor_access"
{
role
=
"roles/bigquery.dataEditor"
member
=
join
(
""
,[
"serviceAccount:service-"
,
data
.
google_project
.
current
.
number
,
"@gcp-sa-cloudasset.iam.gserviceaccount.com"
])
depends_on
=
[
null_resource
.
cloudasset_service_account
]
}
resource
"google_project_iam_member" "asset_sa_user_access"
{
role
=
"roles/bigquery.user"
member
=
join
(
""
,[
"serviceAccount:service-"
,
data
.
google_project
.
current
.
number
,
"@gcp-sa-cloudasset.iam.gserviceaccount.com"
])
depends_on
=
[
null_resource
.
cloudasset_service_account
]
}
resource
"null_resource" "run_export"
{
provisioner
"local-exec"
{
command
=
join
(
" "
,
[
"gcloud asset export --content-type resource"
,
"--project
${
data
.
google_project
.
current
.
project_id
}
,
"--bigquery-table
${
google_bigquery_dataset
.
assets
.
id
}
/tables/assets
,
"--output-bigquery-force --per-asset-type"
])
}
depends_on
=
[
google_project_iam_member
.
asset_sa_editor_access
,
google_project_iam_member
.
asset_sa_user_access
]
}
Review the resources that are going to be created, and then run terraform apply
to make the changes.
Discussion
This recipe created a BigQuery dataset with a table per resource for all the projects in your organization. With that dataset created, you are now able to query details of resource configurations to find where unencrypted traffic is possible.
Recipe 3.10 introduced Cloud Asset registry and built out a mechanism for alerting you when particular resources changed. This recipe extended that to add an ability to retroactively ask questions about your estate. This allows you to determine non-compliant resources as your control set grows and matures.
Finding firewall rules with insecure ports
Following is a BigQuery query which will find all firewall rules that allow access on the following three unencrypted ports:
-
21, unencrypted FTP traffic
-
80, unencrypted HTTP traffic
-
3306, unencrypted MySQL traffic
SELECT
*
FROM
(
SELECT
name
,
allowed
.
ports
as
ports
FROM
`<
project
-
id
>
.
assets
.
assets_compute_googleapis_com_Firewall
`
as
firewall
JOIN
UNNEST
(
firewall
.
resource
.
data
.
allowed
)
as
allowed
)
WHERE
ARRAY_TO_STRING
(
ports
,
""
)
=
"20"
OR
ARRAY_TO_STRING
(
ports
,
""
)
=
"21"
OR
ARRAY_TO_STRING
(
ports
,
""
)
=
"3306"
Finding load balancers accepting HTTP traffic
As a general rule, web load balancers should be configured to accept HTTPS traffic, not HTTP traffic. The following query identifies the load balancer target proxies that are configured for HTTP traffic:
SELECT
resource
.
data
.
name
,
updateTime
,
resource
.
parent
FROM
`<
project
-
id
>
.
assets
.
assets_compute_googleapis_com_TargetHttpProxy
`
As you can see from these examples, you can write queries to determine what resources match a particular state and return when they were last modified and what project they are under. Unfortunately, the export cannot be configured to automatically run on a schedule, but by using Cloud Functions, as shown in Recipe 6.4, you can build a simple scheduler to run the export. This, coupled with BigQuery scheduled queries, enables you to determine when resources fall outside of your encryption requirements.
Summary
Let’s summarize what was learned and deployed in this recipe:
-
On GCP, you can use Cloud Asset Inventory and BigQuery to dynamically understand how resources are configured.
-
This combines with the automated notification component of Recipe 3.10.
-
However, it allows you to look at all current resources, as opposed to only acting when a resource is changed.
-
You created a BigQuery dataset and exported all resources in your estate into distinct tables.
-
Then you saw some example queries of determining when resources are allowing insecure traffic.
-
By adding scheduled Cloud Functions and scheduled BigQuery queries, you can build a solution to alert on any configuration you desire.
4.8 Enforcing In-Transit Data Encryption on AWS
Solution
If you haven’t already done so, familiarize yourself with Terraform and the different authentication mechanisms in Chapter 11.
If you have not previously completed Recipe 3.11, go and do that first so that AWS Config is enabled in your accounts.
Create the following provider.tf file and run terraform init
:
provider
"aws"
{}
terraform
{
required_providers
{
aws
=
{
source
=
"hashicorp/aws"
version
=
"~> 3"
}
}
}
Create the following main.tf file and run terraform plan
:
locals
{
rules_to_deploy
=
[
"ALB_HTTP_TO_HTTPS_REDIRECTION_CHECK"
,
"API_GW_SSL_ENABLED"
,
"ELB_TLS_HTTPS_LISTENERS_ONLY"
,
"REDSHIFT_REQUIRE_TLS_SSL"
,
"RESTRICTED_INCOMING_TRAFFIC"
,
"S3_BUCKET_SSL_REQUESTS_ONLY"
,
"VPC_SG_OPEN_ONLY_TO_AUTHORIZED_PORTS"
]
}
resource
"aws_config_config_rule" "rule"
{
for_each
=
toset
(
local
.
rules_to_deploy
)
name
=
each
.
value
source
{
owner
=
"AWS"
source_identifier
=
each
.
value
}
}
Review the resources that are going to be created, and then run terraform apply
to make the changes.
Discussion
This recipe deployed the following series of Managed AWS Config rules to the account that detect when resources are configured to allow certain kinds of unencrypted traffic:
- ALB_HTTP_TO_HTTPS_REDIRECTION_CHECK
-
Checks whether Application Load Balancers allow straight HTTP traffic; ideally they automatically redirect clients to HTTPs.
- API_GW_SSL_ENABLED
-
Checks whether an SSL certificate has been configured for the API Gateway. Without one, you cannot handle encrypted traffic.
- ELB_TLS_HTTPS_LISTENERS_ONLY
-
Checks whether Elastic Load Balancers have listeners for HTTP traffic.
- REDSHIFT_REQUIRE_TLS_SSL
-
Checks whether your Redshift data warehouse only accepts SSL/TLS-based traffic.
- RESTRICTED_INCOMING_TRAFFIC
-
Checks whether security groups allow traffic on ports that have secure variants; by default, they are 20, 21, 3389, 3306, and 4333. But this rule can be configured to check for specific ports.
- S3_BUCKET_SSL_REQUESTS_ONLY
-
Checks whether S3 buckets allow direct HTTP traffic.
- VPC_SG_OPEN_ONLY_TO_AUTHORIZED_PORTS
-
Checks whether any security groups with inbound traffic from 0.0.0.0/0 have any ports configured outside an approved list that you control.
Note
Analyzing VPC flow logs is another way to solve the problem of detecting unencrypted traffic. However, to automate the process would require a third-party application or an internal development effort.
In Recipe 7.8, you’ll see what options exist for actively preventing people from being able to deploy noncompliant infrastructure, but the strategies are not foolproof. This necessitates the ability for you to operate in the same “trust but verify” posture that is a common theme across recipes. In this case, the verification stems from the rules, creating a feedback loop that allows you to understand when teams are in need of support and enablement.
Summary
Let’s summarize what was learned and deployed in this recipe:
-
AWS provides a selection of managed Config rules that identify when resources allow for unencrypted traffic.
-
They do not cover all resources; however, they do target common culprits.
-
Actively preventing noncompliant infrastructure is never foolproof, but by configuring AWS Config rules, you have a feedback loop that allows you to understand when infrastructure doesn’t meet the required controls.
-
By combining this recipe with Recipe 7.8, you’ll be able to deploy these rules across all accounts in the organization, allowing you to see into every account.
4.9 Enforcing In-Transit Data Encryption on Azure
Solution
If you haven’t already done so, familiarize yourself with Terraform and the different authentication mechanisms in Chapter 11.
Create the following provider.tf file and run terraform init
:
terraform
{
required_providers
{
azurerm
=
{
source
=
"hashicorp/azurerm"
version
=
"~> 2"
}
}
}
provider
"azurerm"
{
features
{}
}
Create the following main.tf file and run terraform plan
:
data
"azurerm_subscription" "current"
{}
locals
{
policy_ids
=
[
"b7ddfbdc-1260-477d-91fd-98bd9be789a6"
,
"e802a67a-daf5-4436-9ea6-f6d821dd0c5d"
,
"d158790f-bfb0-486c-8631-2dc6b4e8e6af"
,
"399b2637-a50f-4f95-96f8-3a145476eb15"
,
"4d24b6d4-5e53-4a4f-a7f4-618fa573ee4b"
,
"9a1b8c48-453a-4044-86c3-d8bfd823e4f5"
,
"6d555dd1-86f2-4f1c-8ed7-5abae7c6cbab"
,
"22bee202-a82f-4305-9a2a-6d7f44d4dedb"
,
"404c3081-a854-4457-ae30-26a93ef643f9"
,
"8cb6aa8b-9e41-4f4e-aa25-089a7ac2581e"
,
"f9d614c5-c173-4d56-95a7-b4437057d193"
,
"f0e6e85b-9b9f-4a4b-b67b-f730d42f1b0b"
,
"a4af4a39-4135-47fb-b175-47fbdf85311d"
,
]
policy_assignments
=
azurerm_subscription_policy_assignment
.
transit
}
resource
"azurerm_subscription_policy_assignment" "transit"
{
count
=
length
(
local
.
policy_ids
)
name
=
"transit${count.index}"
policy_definition_id
=
join
(
""
,
[
"/providers/Microsoft.Authorization/policyDefinitions/"
,
local
.
policy_ids
[
count
.
index
]
])
subscription_id
=
data
.
azurerm_subscription
.
current
.
id
}
resource
"azurerm_policy_remediation" "transit"
{
count
=
length
(
local
.
policy_ids
)
name
=
"transit${count.index}"
scope
=
data
.
azurerm_subscription
.
current
.
id
policy_assignment_id
=
local
.
policy_assignments
[
count
.
index
].
id
}
Review the resources that are going to be created, and then run terraform apply
to make the changes.
Discussion
In Security Center, the following list of recommendations specifically target encrypted data in transit:
-
API App should only be accessible over HTTPS.
-
Enforce SSL connection should be enabled for MySQL database servers.
-
Enforce SSL connection should be enabled for PostgreSQL database servers.
-
FTPS should be required in your API App.
-
FTPS should be required in your Functions App.
-
FTPS should be required in your Web App.
-
Functions App should only be accessible over HTTPS.
-
Only secure connections to your Redis Cache should be enabled.
-
Secure transfer to storage accounts should be enabled.
-
TLS should be updated to the latest version for your API App.
-
TLS should be updated to the latest version for your Functions App.
-
TLS should be updated to the latest version for your Web App.
-
Web App should only be accessible over HTTPS.
In this recipe, the local.policy_ids
variable contains the IDs for each of these recommendations. As Azure Policy is naturally extended over time, this recipe will need updating to be exhaustive for what policies are available. Additionally, with the automated remediation actions here, you can end up in a position where the infrastructure as code is no longer reflective of the reality on Azure. Remediating in this way should be a last resort; instead, by using Recipe 6.3, you will see how to support teams in deploying infrastructure that encrypts data in transit by default. You also run the risk of potentially breaking systems by changing active configurations, which can be politically challenging and erode trust.
By using these policies, you target common misconfigurations for encryption, but these policies alone are not sufficient for ensuring data encryption across your entire estate. Performing training sessions with delivery teams, running threat modelling sessions, and migrating to more cloud native services, such as containers and Recipe 6.6, will make it easier to understand how data moves around your estate, as Azure is more heavily leveraged to perform the heavy lifting when it comes to encryption.
Summary
Let’s summarize what was learned and deployed in this recipe:
-
Azure Security Center provides a series of recommendations on encryption in transit.
-
By using the Azure Policies that underpin these recommendations, you can identify and remediate problematic infrastructure.
-
Automated remediation actions, while powerful, can undermine infrastructure-as-code usage and potentially erode trust.
-
These policies are a great starting point, but ensuring encryption in transit across the entire estate involves the following:
-
Training teams in the how and why of encryption in transit
-
Running threat modelling sessions
-
Providing teams with secure-by-default infrastructure patterns
-
-
By migrating to more cloud native infrastructure such as Recipe 6.6, you can make it simpler to understand how encryption is implemented across your estate.
4.10 Preventing Data Loss on GCP
Solution
If you haven’t already done so, familiarize yourself with Terraform and the different authentication mechanisms in Chapter 11.
You need to create a service account to interact with the Google Workspace APIs.
Create a variables.tf file and copy the following contents:
variable
"project_id"
{
type
=
string
description
=
"The project to create the resources in"
}
variable
"region"
{
type
=
string
description
=
"The region to create the resources in"
}
variable
"organization_domain"
{
type
=
string
description
=
"The organization domain of your Google Cloud estate"
}
Then fill out the corresponding terraform.tfvars file:
project_id = "" region = "" organization_domain = ""
Create the following provider.tf file and run terraform init
:
provider
"google"
{
project
=
var
.
project_id
region
=
var
.
region
}
terraform
{
required_providers
{
=
{
source
=
"hashicorp/google"
version
=
"~> 3"
}
local
=
{
source
=
"hashicorp/local"
version
=
"~> 2"
}
}
}
Create the following main.tf file and run terraform plan
:
data
"google_organization" "current"
{
domain
=
var
.
organization_domain
}
resource
"google_project_service" "dlp"
{
service
=
"dlp.googleapis.com"
}
resource
"google_service_account" "dlp_admin"
{
account_id
=
"dlp-admin"
display_name
=
"Data Loss Prevention Configuration"
}
resource
"google_organization_iam_member" "dlp_access"
{
org_id
=
data
.
google_organization
.
current
.
org_id
role
=
"roles/dlp.admin"
member
=
"serviceAccount:${google_service_account.dlp_admin.email}"
}
resource
"google_project_iam_member" "viewer"
{
role
=
"roles/viewer"
member
=
"serviceAccount:${google_service_account.dlp_admin.email}"
}
resource
"google_project_iam_member" "dataset_owner"
{
role
=
"roles/bigquery.dataOwner"
member
=
"serviceAccount:${google_service_account.dlp_admin.email}"
}
resource
"google_service_account_key" "dlp_admin"
{
service_account_id
=
google_service_account
.
dlp_admin
.
name
public_key_type
=
"TYPE_X509_PEM_FILE"
}
resource
"local_file" "service_account"
{
content
=
base
64
decode
(
google_service_account_key
.
dlp_admin
.
private_key
)
filename
=
"service_account.json"
}
Review the resources that are going to be created, and then run terraform apply
to make the changes.
In a new directory, create a variables.tf file and copy the following contents:
variable
"service_account_key_path"
{
type
=
string
description
=
"Path to where the service account key is located"
}
variable
"project_id"
{
type
=
string
description
=
"The project to create the resources in"
}
variable
"region"
{
type
=
string
description
=
"The region to create the resources in"
}
variable
"bucket_path"
{
type
=
string
description
=
"The bucket path to inspect with DLP"
}
Then fill out the corresponding terraform.tfvars file:
service_account_key_path = "" project_id = "" region = "" bucket_path = ""
Create the following provider.tf file and run terraform init
:
provider
"google"
{
project
=
var
.
project_id
region
=
var
.
region
credentials
=
var
.
service_account_key_path
}
terraform
{
required_providers
{
=
{
source
=
"hashicorp/google"
version
=
"~> 3"
}
}
}
Create the following main.tf file and run terraform plan
:
data
"google_project" "current"
{}
resource
"google_data_loss_prevention_inspect_template" "basic"
{
parent
=
data
.
google_project
.
current
.
id
}
resource
"google_bigquery_dataset" "findings"
{
dataset_id
=
"findings"
delete_contents_on_destroy
=
true
}
resource
"google_data_loss_prevention_job_trigger" "basic"
{
parent
=
data
.
google_project
.
current
.
id
display_name
=
"Scan ${var.bucket_path}"
triggers
{
schedule
{
recurrence_period_duration
=
"86400s"
}
}
inspect_job
{
inspect_template_name
=
google_data_loss_prevention_inspect_template
.
basic
.
id
actions
{
save_findings
{
output_config
{
table
{
project_id
=
data
.
google_project
.
current
.
project_id
dataset_id
=
google_bigquery_dataset
.
findings
.
dataset_id
}
}
}
}
storage_config
{
cloud_storage_options
{
file_set
{
url
=
"gs://${var.bucket_path}/**"
}
}
timespan_config
{
enable_auto_population_of_timespan_config
=
true
timestamp_field
{
name
=
"timestamp"
}
}
}
}
}
Review the resources that are going to be created, and then run terraform apply
to make the changes.
Discussion
This recipe configured GCP’s Data Loss Prevention (DLP) solution and created a daily DLP that scans the created Cloud Storage bucket.
DLP on GCP is a multifaceted service that can integrate into a variety of applications and architectures to ensure that your data is classified and handled appropriately. In this instance, you have set up a scheduled job that scans a particular storage bucket. You may wonder why the recipe does not start scanning all the buckets that exist, and that is because DLP can quickly become an expensive service to operate. This recipe is a way of dipping your toe in the water without the risk of a scary bill arriving at the end of the month. Another option to explore when productionizing your DLP configuration is to sample data. This is where you make determinations on a random sample of the data, rather than having to process and pay for it all.
For the scanning of static data, DLP can also run jobs directly against BigQuery datasets and Datastore kinds, as well as Cloud Storage buckets, allowing you to understand where the most valuable data lies. Additionally, by automatically forwarding the findings into BigQuery, it is possible to dynamically query the output of DLP to ensure you can find and triage the highest-priority findings.
The service comes with over 140 preconfigured infoType detectors, allowing you to automatically identify common forms of PII, from Australian Medicare numbers, to US Social Security numbers, and everything in between. You can also construct your own detectors to classify data that is unique to your business.
In addition to identifying sensitive data, DLP also provides pseudonymization capabilities, allowing for the replacement of sensitive data with nonidentifying tokens, preserving the data utility while minimizing the data risk when it is used. You can also configure it to do automatic redaction, ensuring the PII is not allowed to cross security boundaries.
Common solutions in the space, although outside of the remit of the security function, are automatic data classifiers, where data is placed into a staging bucket before being processed and segregated into sensitive and nonsensitive data. Another option is constructing a Dataflow pipeline that automatically redacts and pseudonymizes data as it flows through in real time.
Summary
Let’s summarize what was learned and deployed in this recipe:
-
Google’s Data Loss Prevention (DLP) service is critical to managing sensitive data at scale.
-
You can leverage DLP to routinely scan storage locations to automatically classify data and report findings.
-
The findings can be automatically forwarded into BigQuery, allowing you to query your data.
-
DLP can get expensive at scale, so focusing your scans on particularly risky areas, using sampling and ensuring you only scan modified data, can keep it under control.
-
DLP also provides other services, such as pseudonymization and redaction, allowing you to ensure that data can still be utilized but with significantly reduced risk.
-
You created a DLP inspection template and a job trigger to automatically scan a Cloud Storage bucket every day.
4.11 Preventing Data Loss on AWS
Solution
If you haven’t already done so, familiarize yourself with Terraform and the different authentication mechanisms in Chapter 11.
Create a variables.tf file and copy the following contents:
variable
"delegated_admin_account"
{
type
=
string
description
=
"The account ID for the account to be the Config delegated admin"
}
variable
"cross_account_role"
{
type
=
string
description
=
"The cross account role to assume"
}
Then fill out the corresponding terraform.tfvars file:
delegated_admin_account = "" cross_account_role = ""
Create the following provider.tf file and run terraform init
:
provider
"aws"
{}
provider
"aws"
{
alias
=
"delegated_admin_account"
assume_role
{
role_arn
=
join
(
""
,
[
"arn:aws:iam::"
,
var
.
delegated_admin_account
,
":role/"
,
var
.
cross_account_role
])
}
}
terraform
{
required_providers
{
aws
=
{
source
=
"hashicorp/aws"
version
=
"~> 3"
}
}
}
Create the following main.tf file and run terraform plan
:
data
"aws_organizations_organization" "this"
{}
resource
"aws_macie2_account" "payer"
{}
resource
"aws_macie2_organization_admin_account" "this"
{
admin_account_id
=
var
.
delegated_admin_account
depends_on
=
[
aws_macie
2
_account
.
payer
]
}
resource
"aws_macie2_member" "account"
{
provider
=
aws
.
delegated_admin_account
for_each
=
{
for
account
in
data
.
aws_organizations_organization
.
this
.
accounts
:
account.id
=
> account if account.id !
=
var
.
delegated_admin_account
}
account_id
=
each
.
value
.
id
=
each
.
value
.
depends_on
=
[
aws_macie
2
_organization_admin_account
.
this
]
}
Review the resources that are going to be created, and then run terraform apply
to make the changes.
Discussion
This recipe configured Amazon Macie, giving you a single view on PII data in S3 buckets across your organization.
Amazon Macie is a service focused on making the mass of objects in S3, in many cases terabytes to petabytes of data, understandable from a sensitivity perspective. One of the main features is the evaluation of S3 bucket configuration, looking at the following:
-
Which buckets are publicly accessible for read or write operations
-
Whether buckets have default encryption that is enforced by bucket policies
-
Where buckets are shared, both within the organization and with external parties
On top of this, Amazon provides a variety of managed data identifiers that detect sensitive data, such as PII, PHI, and financial. Additionally, you create custom data identifiers to detect sensitive data that is unique to your organization or business domain. In doing so, you can cross-reference what is being stored with how it is being stored, ensuring that appropriate levels of protection are applied to your most sensitive assets.
Whenever Macie detects a potential issue, it raises a finding. Each finding provides a severity rating, information about the affected resource, and metadata about when and how Macie discovered the issue. These findings can be sent directly into AWS Security Hub, as was configured in Recipe 3.2. They are also automatically loaded on Amazon EventBridge, which allows you to create and trigger bespoke workflows upon certain findings being raised.
All this data is brought into a dashboard, giving you a simple visual way of identifying issues in your environment. By enabling Macie across the entire organization, this recipe allows you to review findings across all accounts from one central location.
In order for Macie to be able to read the data in buckets, where restrictive bucket policies are applied, you will need to ensure that a Macie service role exception is applied. For example, given a bucket policy that only allows a certain role to access the bucket, you need to add an extra condition for the Macie service role, like so:
data
"aws_caller_identity" "current"
{}
data
"aws_iam_policy_document" "restricted"
{
statement
{
effect
=
"Deny"
actions
=
[
"s3:*"
]
resources
=
[
"${aws_s3_bucket.bucket.arn}/*"
,
aws_s
3
_bucket
.
bucket
.
arn
]
principals
{
type
=
"*"
identifiers
=
[
"*"
]
}
condition
{
test
=
"StringNotLike"
values
=
[
"aws:PrincipalArn"
]
variable
=
join
(
""
,
[
"arn:aws:iam::"
,
data
.
aws_caller_identity
.
account_id
,
":role/RestrictedBucketAccessRole"
])
}
condition
{
test
=
"StringNotLike"
values
=
[
"aws:PrincipalArn"
]
variable
=
join
(
""
,
[
"arn:aws:iam::"
,
data
.
aws_caller_identity
.
account_id
,
":role/aws-service-role/macie.amazonaws.com/AWSServiceRoleForAmazonMacie"
])
}
}
}
Summary
Let’s summarize what was learned and deployed in this recipe:
-
To prevent the loss of sensitive data in your estate, it is critical to know where the data is.
-
Amazon Macie allows for the identification of sensitive data across your estate.
-
Macie provides a variety of managed data identifiers that automatically classify data.
-
It also looks at the configuration of the S3 buckets to identify potential issues.
-
Issues discovered in configuration or data protection are raised as findings.
-
This recipe configured Macie centrally, so all findings can be triaged and actioned from a single location.
-
To get the best value from Macie, you may need to update bucket policies, allowing its service role to access the objects in the buckets.
4.12 Preventing Data Loss on Azure
Solution
If you haven’t already done so, familiarize yourself with Terraform and the different authentication mechanisms in Chapter 11.
Create a variables.tf file and copy the following contents:
variable
"location"
{
type
=
string
description
=
"The location to deploy your resource into"
}
variable
"purview_account_name"
{
type
=
string
description
=
"The name for the Purview account"
}
variable
"storage_account_name"
{
type
=
string
description
=
"The name for the storage account"
}
Then fill out the corresponding terraform.tfvars file:
location = "" purview_account_name = "" storage_account_name = ""
Create the following provider.tf file and run terraform init
:
terraform
{
required_providers
{
azurerm
=
{
source
=
"hashicorp/azurerm"
version
=
"~> 2"
}
local
=
{
source
=
"hashicorp/local"
version
=
"~> 2"
}
null
=
{
source
=
"hashicorp/null"
version
=
"~> 3"
}
}
}
provider
"azurerm"
{
features
{}
}
Install the purviewcli
from the purviewcli GitHub. This recipe was developed against version 0.1.31, so it may require modification if you install a later version.
Create the following main.tf file and run terraform plan
:
data
"azurerm_client_config" "current"
{}
data
"azurerm_subscription" "current"
{}
resource
"azurerm_resource_group" "purview"
{
name
=
"purview-resources"
location
=
var
.
location
}
resource
"azurerm_purview_account" "purview"
{
name
=
var
.
purview_account_name
resource_group_name
=
azurerm_resource_group
.
purview
.
name
location
=
azurerm_resource_group
.
purview
.
location
sku_name
=
"Standard_4"
}
resource
"azurerm_role_assignment" "data_curator"
{
scope
=
azurerm_purview_account
.
purview
.
id
role_definition_name
=
"Purview Data Curator"
principal_id
=
data
.
azurerm_client_config
.
current
.
object_id
}
resource
"azurerm_role_assignment" "data_source_admin"
{
scope
=
azurerm_purview_account
.
purview
.
id
role_definition_name
=
"Purview Data Source Administrator"
principal_id
=
data
.
azurerm_client_config
.
current
.
object_id
}
resource
"azurerm_storage_account" "purview"
{
name
=
var
.
storage_account_name
resource_group_name
=
azurerm_resource_group
.
purview
.
name
location
=
azurerm_resource_group
.
purview
.
location
account_tier
=
"Standard"
account_replication_type
=
"GRS"
identity
{
type
=
"SystemAssigned"
}
}
resource
"azurerm_storage_container" "purview"
{
name
=
"purview"
storage_account_name
=
azurerm_storage_account
.
purview
.
name
container_access_type
=
"private"
}
resource
"azurerm_role_assignment" "reader"
{
scope
=
azurerm_storage_account
.
purview
.
id
role_definition_name
=
"Storage Blob Data Reader"
principal_id
=
azurerm_purview_account
.
purview
.
identity
[
0
].
principal_id
}
resource
"local_file" "storage_account"
{
filename
=
"blob_storage.json"
content
=
<<CONTENT
{
"id": "datasources/AzureStorage"
,
"kind": "AzureStorage"
,
"name": "AzureStorage"
,
"properties"
:
{
"collection"
:
null
,
"endpoint": "${azurerm_storage_account.purview.primary_blob_endpoint}"
,
"location": "${azurerm_resource_group.purview.location}"
,
"parentCollection"
:
null
,
"resourceGroup": "${azurerm_resource_group.purview.name}"
,
"resourceName": "${azurerm_storage_account.purview.name}"
,
"subscriptionId": "${data.azurerm_subscription.current.subscription_id}"
}
}
CONTENT
}
resource
"local_file" "scan"
{
filename
=
"scan.json"
content
=
<<CONTENT
{
"kind": "AzureStorageMsi"
,
"properties"
:
{
"scanRulesetName": "AzureStorage"
,
"scanRulesetType": "System"
}
}
CONTENT
}
resource
"null_resource" "add_data_source"
{
provisioner
"local-exec"
{
command
=
join
(
" "
,
[
"pv scan putDataSource"
,
"--dataSourceName
=
AzureStorage"
,
"--payload-file
=
${
local_file
.
storage_account
.
filename
}
"
,
"--purviewName ${azurerm_purview_account.purview.name}"
])
}
}
resource
"null_resource" "create_scan"
{
provisioner
"local-exec"
{
command
=
join
(
" "
,
[
"pv scan putScan"
,
"--dataSourceName
=
AzureStorage"
,
"--scanName
=
storage"
,
"--payload-file
=
${
local_file
.
scan
.
filename
}
"
,
"--purviewName ${azurerm_purview_account.purview.name}"
])
}
depends_on
=
[
null_resource
.
add_data_source
]
}
resource
"null_resource" "run_scan"
{
provisioner
"local-exec"
{
command
=
join
(
" "
,
[
"pv scan runScan"
,
"--dataSourceName
=
AzureStorage"
,
"--scanName
=
storage"
,
"--purviewName ${azurerm_purview_account.purview.name}"
])
}
depends_on
=
[
null_resource
.
create_scan
]
}
Review the resources that are going to be created, and then run terraform apply
to make the changes.
Discussion
This recipe created a Purview application and used it to scan a storage account for PII data.
Azure Purview is a unified data governance service. By leveraging its capabilities, you are able to classify the data across your estate automatically against a collection of default rules that Microsoft provides. With the potential scale and sprawl that Azure allows you to achieve, having the right tools in place to understand where data is and how it is protected is critical.
The default rules detect many kinds of PII, such as
-
US/UK passport numbers
-
Australian bank account numbers
-
IP addresses
In order for Purview to be able to access the data, you need to give Purview managed identity access to the resources in your estate, which you did in the recipe through the creation of the azurerm_role_assignment.reader
resource. By giving the identity the required permissions at high-level scopes, you can have the access filter down rather than directly applying it to every resource.
Additionally, as your use with Purview matures and scales, use collections to keep your data map manageable and enable more nuanced and flexible identity and access management. Here, you simply registered the resource under the default collection, but a common pattern includes segmenting by business unit. This also allows you to apply only relevant scans to each collection, ensuring performance and cost-effectiveness.
In this recipe, you executed an ad hoc scan, but for full production use, you need to decide how frequently to schedule the scans based on cost, risk, and value. To manage the cost aspect, it is also possible to run incremental scans so you focus on the new and the changed rather than redundantly scanning old data. Additionally, it is possible to build your own rules to classify data using RegEx and Bloom Filters, so you can identify the data that is specifically critical to your business.
Azure Purview also provides many integrations that allow it to operate in both a multicloud and hybrid cloud environment. Connectors already exist for services such as
-
SAP HANA
-
On-premise SQL server
-
Amazon S3
-
Google BigQuery
By supporting data sources outside of Azure as first-class citizens, Purview has the potential to be the centralized data governance tooling for any business with an Azure presence, ensuring that you can have a single pane of glass, a single classification engine, and no redundant effort when managing a suite of tools for a heterogeneous environment.
Summary
Let’s summarize what was learned and deployed in this recipe:
-
At scale, the hardest thing about data is understanding what you have and where it lives.
-
Azure Purview is a centralized data governance platform that allows you to classify data.
-
You deployed a Purview application and an Azure storage account to hold some sensitive data.
-
By programmatically running scans, you can ensure that your data is classified.
-
Scans can be configured to run on a schedule and against collections of resources.
-
Azure Purview has first-class support for resources outside of Azure, allowing it to become a truly unified data governance tool and approach.
Get Cloud Native Security Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.