Chapter 4. Protecting Your Data

Protecting your data requires a delicate equilibrium between enabling data access for informed decision making and implementing robust data controls to protect sensitive information. Balancing data access and control is essential to establishing trust while ensuring compliance so that data consumers have quick and easy access to the data they are authorized to use. It also helps prevent the introduction of additional business risk.

Let’s take a look at the key considerations and strategies that empower organizations to achieve this balance of easy access with effective control.

The Value of Protecting Your Data

Organizations store and protect sensitive data, including PII about customers and employees and other internal information such as preliminary financial data or detailed competitive plans and supporting analysis. If this data is improperly disclosed, it can create mistrust and risk the imposition of fines or, at minimum, damage the company’s reputation or market position. Yet data must be used every day for the company to function. This includes supporting sales staff with comprehensive information about the customers and publishing timely and accurate financial statements.

Because sensitive data must be used widely for business purposes without excessive delay and the data must be safeguarded from inappropriate use, data governance must have efficient process, procedures, policies, and supporting technologies so that people and groups can be easily authorized to use data for their job functions.

While information security, compliance, and privacy departments possess deep expertise in security practices, regulations, and tools, they need partnership with data governance to develop and apply policies to specific domains of data and to make decisions about the applicability of regulations to these domains. For example, the privacy department brings expertise about what data to classify for PII, data handling policies, and acceptable use policies within the organization. An example policy is to always mask PII data unless it is the customer accessing the data from within a secure application. It is the job of the technology teams to automate definition of the policy for masking, classifying sensitive data, and ensuring the policy is auditable.

Capabilities for Protecting Data While Balancing Access and Control

A comprehensive data governance program includes a full set of security capabilities in partnership with InfoSec and other groups throughout the organization.

The core capabilities that form the foundation for sharing data responsibly include the following:

  • Data security

  • Data compliance

  • Data lifecycle management

Data Security

Data security is the practice of granting data access to the right users and protecting data against corruption and theft.

Security is a broad concept that covers identity and access management, infrastructure protection, data protection, logging and monitoring, and incidence response. For the purpose of this book, we are focused on data security, data compliance, and data lifecycle policies.

This concept encompasses a comprehensive range of information security responsibilities, including:

  • Maintaining the physical security of hardware and storage devices (and ensuring cloud providers offer this capability) as well as the logical security of software applications

  • Formulating and implementing organizational policies and procedures to ensure data protection

  • Enabling identity and access management, including identity system management and integration of identity systems with system authentication and data authorizations

  • Managing role-based access controls and purpose-based access controls

  • Providing protection not only against cybercriminal activities but also against insider threats and human errors

To achieve this level of security, data security involves deploying tools and technologies that enhance an organization’s visibility into the location and usage of critical data. These tools should be equipped to apply various protective measures, such as authentication, authorization, encryption, and data masking. Additionally, automation of reporting streamlines audits and ensures adherence to regulatory requirements.

Data security also includes development of data classification policies that define levels of sensitivity (e.g., public, internal, confidential, highly confidential) and the appropriate protections and tools to secure data at rest and in transit at each of the sensitivity levels. Data owners then use their data curation processes and data catalogs to apply these classifications to their domain of data by determining the sensitivity level their domain (customer, product, sales, etc.) and associated data elements belong to. Next, working with partners in business and IT, they apply the appropriate security practices to the data.

To classify data within the data curation process, we recommend that customers fully automate their data classification policies across all their structured and unstructured data. This typically requires leveraging technology that uses statistics and ML to identify the data and classify it accordingly. Some examples include PII, Payment Card Industry (PCI) data, Health Insurance Portability and Accountability Act (HIPAA) information, or inappropriate data that should be moderated.

Data Compliance

Data compliance is the practice of following government regulations to ensure that sensitive data is managed in accordance with the public interest and any other interests reflected in regulations.

Regulations can be broadly applicable, such as the GDPR from the European Union (EU), which mandates protection of personal data by, for example, allowing EU citizens to correct personal data and to be made aware of how their data is used. The California Consumer Privacy Act (CCPA) provides similar protections for California residents.

Other regulations may be narrowly focused. For example, the Genetic Information Nondiscrimination Act (GINA) is a US law that prevents insurers from using genetic information to make decisions about a person’s eligibility, coverage, underwriting, or premium costs. It also bars employers from making hiring, firing, promotion, or any other employment decisions based on a person’s genetic information.

Any modern data governance program must stay up to date on these regulations as they emerge and evolve and determine the applicability to their data assets.

Data Lifecycle Management

A fundamental driver for data lifecycle management is appropriately safeguarding data as it ages.

With vast data growth, persisting everything indefinitely in active primary storage incurs unnecessary costs, yet often data must be retained for long periods for infrequent business use or for regulatory compliance. For example, according to the U.S. Department of Labor, under the Fair Labor Standards Act (FLSA), employers must maintain records for a period of at least three years. Thus, defined lifecycle policies enable automatically transitioning less business-critical data into cost-efficient secondary storage tiers that still provide data retention and protection. Disposing of data that is no longer needed for any purpose reduces vulnerability for stale datasets well past usefulness.

Advanced tools to automate policy-based progression of data across cost-optimized tiers with integrated protection controls provide a governed path that balances accessibility, security, and budget throughout every phase of data’s journey to disposition. The sophistication of today’s lifecycle capabilities shifts the narrative. Rather than risky data bloat, aging data can reliably and economically be managed to responsibly mitigate threats while avoiding the consequences of needless retention beyond established requirements.

Having explored the essential capabilities for protecting your data, let’s delve into the technology needed for data security and compliance.

Technology You Need

To build the data governance capabilities to protect your data, you need technology that supports proficiency in each capability area:

Data security

Modern data security solutions provide unified visibility and fine-grained access controls that are deeply integrated with data catalogs and data consumption/production technologies such as data warehouses, ETL tools, SQL engines, and data processing engines like Apache Spark. Fine-grained access controls are often integrated with policies for role-based access controls, purpose-based access controls, encryption, masking, and monitoring based on tagging policies as data traverses multiple services and repositories.

In addition to authorizing data access, organizations use advanced behavioral analytics and ML algorithms to detect credential misuse, abnormal queries, unauthorized access attempts, ransomware activity, and insider threats in real time and to trigger automated responses.

Data compliance

Technology to support data compliance continuously audits your system usage to help assess risk and compliance with regulations and industry standards. Automated tools help you store and access compliance reports in a self-service portal.

Data lifecycle

Data lifecycle management technology provides cost-effective storage and retrieval along with automated movement of data based on data access and archival requirements and changes in requirements over time. This type of automated data movement is used for common data storage, such as object storage, and also within specialized storage technologies, such as purpose-built databases.

Now that we’ve covered the importance of protecting your data, let’s look at how AWS enables data governance.

Get Data Governance with AWS now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.