Chapter 4. Safeguarding Data with Clean Rooms

With the rise of data privacy regulations such as the GDPR and the CCPA and the increasing demand for external data sources, such as third-party data providers and data marketplaces, organizations need a secure, controlled, and private way to collaborate on data with their customers and partners. However, traditional data sharing solutions often require data replication and trust-based agreements, which expose organizations to potential risks of data misuse and privacy breaches.

The demand for data clean rooms has been growing in various industries and use cases due to the changing security, compliance, and privacy landscape, the fragmentation of the data ecosystem, and the new ways to monetize data. According to Gartner, 80% of advertisers that spend more than $1 billion annually on media will use data clean rooms by 2023.1 However, existing solutions have limitations on data movement and replication, are restricted to SQL, and are hard to scale.

The Databricks Data Intelligence Platform provides a comprehensive set of tools to build, serve, and deploy a scalable, flexible, and interoperable data clean room based on your data privacy and governance requirements. Some of the features include secure data sharing with no replication, full support to run arbitrary workloads and languages, easy scalability with guided onboarding experience, isolated compute, and being privacy-safe with fine-grained access controls.

Databricks Clean Rooms enable organizations to share and join their existing data in a secure, governed, and privacy-safe environment. Participants in the Databricks Clean Rooms can perform analysis on the joined data using common languages such as Python and SQL without the risk of exposing their data to other participants. Participants have full control of their data and can decide which participants can perform what analysis on their data without exposing sensitive data, such as personally identifiable information (PII).

This chapter provides an in-depth look at how Databricks Clean Rooms work and how they can help organizations guard their data privacy. The chapter also explores key partnership integrations that enhance the capabilities of Databricks Clean Rooms and provide additional benefits for data privacy and security. By using Databricks Clean Rooms with these partner solutions, organizations can unlock new insights and opportunities from their data while preserving data privacy.

Challenges and Solutions for Safeguarding Data

Implementing and using Databricks Clean Rooms can present several challenges, but Databricks provides solutions to address these issues effectively:

Data privacy and security

One of the main challenges is ensuring data privacy and security. When multiple participants join their first-party data and perform analysis, there is a risk of exposing sensitive data to other participants. Databricks Clean Rooms provide a secure, governed, and privacy-safe environment.

Data standardization

Several data clean rooms have not yet adopted universal standards for their implementation. This means that platforms and advertisers may be trying to pool data that exists in multiple formats, making the prep work for aggregating those different formats time-consuming. Databricks Clean Rooms allow computations to be run in any language, including SQL, R, Scala, Java, and Python, which enables simple use cases such as joins as well as complex computations such as machine learning, supporting data in multiple formats.

Scalability

As the number of participants increases, it becomes more difficult to manage the clean room environment. Databricks Clean Rooms are designed to easily scale to multiple participants. They also reduce time to insights with predefined templates for common clean room use cases.

Interoperability

Interoperability is the ability of different systems, devices, or applications to exchange and use data without requiring special adaptations or conversions. Interoperability is a significant challenge when working with data across different clouds, regions, and platforms. With Delta Sharing, clean room collaborators can work together across clouds, across regions, and even across data platforms without requiring data movement. By addressing these challenges, Databricks Clean Rooms enable businesses to collaborate securely on any cloud in a privacy-safe way.

Databricks Clean Rooms Explained

Databricks Clean Rooms are innovative, closed-loop environments designed to facilitate seamless collaboration among disparate parties, all while safeguarding sensitive information and preserving proprietary data. By providing a secure and privacy-safe haven for data sharing, Clean Rooms emerge as a pivotal asset for organizations navigating intricate data-driven endeavors.

As organizational data flows from various sources, including media platforms, walled gardens, and collaborative partners, Clean Rooms offer a sanctuary in which data privacy remains unwavering. They cater to the unique needs of data-driven organizations seeking to unlock insights from diverse sources within a fragmented and regulated data ecosystem. Clean Rooms usher in a host of compelling advantages, each contributing to a robust and efficient data collaboration framework. They grant unparalleled access to data and intellectual property, fostering an environment in which partners can collaborate and innovate effortlessly, which results in expanded avenues for partners and enables them to explore new use cases and automated workflows. The gains ripple further, bringing heightened efficiency, productivity, and scalability—indispensable attributes for organizations operating at scale. Notably, Clean Rooms are pivotal in safeguarding existing investments, acting as custodians of valuable data assets in an ever-evolving digital landscape.

The applications of Clean Rooms span a multitude of industries, each finding its unique utility. From optimizing campaign performance and enhancing personalization in consumer-centric fields to curbing fraud and mitigating risk in the financial sector, Clean Rooms empower data-driven decision making.

Constructing and nurturing Clean Rooms is a methodical process, orchestrated through the establishment of secure data connections across cloud platforms, the utilization of containerized code to fuel diverse use cases, and the defining of roles and permissions for collaborators. The foundation of privacy is fortified through the application of privacy-enhancing technologies (PETs), including encryption, obfuscation, data minimization, differential privacy, and noise injection. Stringent policies further ensure that data usage adheres to approved queries only, fostering a controlled and privacy-conscious environment.

Clean Rooms promote interoperability and automation by encouraging collaboration across multiple clouds and platforms without necessitating data movement. This environment provides a unified user experience, supports templatized analytics and natural language queries, simplifies user interactions, and facilitates ease of use. Realizing the potential of Clean Rooms necessitates a well-defined strategy and adherence to best practices. This journey encompasses pivotal steps such as data auditing, stakeholder education, sandbox testing, partner onboarding, ongoing operations management, preparation for data science tasks, centralization of outputs, and dissemination of insights.

Now available in private preview, Databricks Clean Rooms can effectively compartmentalize data, fostering collaborative workflows that propel innovation while steadfastly upholding privacy and regulatory compliance. In an era in which the demand for external data soars, Clean Rooms emerge as a secure data exchange, enabling organizations to embrace external data sources with confidence and drive data-driven evolution.

Crossing industries, Clean Rooms redefine possibilities. Consumer packaged goods (CPG) companies can harness the synergy of first-party advertisement and point-of-sale (POS) transactional data for sales uplift. The media industry could see a new era of targeted advertising, enhanced segmentation, and transparency in ad effectiveness, all while preserving data privacy. In financial services, the value chain aligns for proactive fraud detection and anti-money-laundering strategies. As organizations strive to balance innovation and compliance, Clean Rooms will secure data collaboration, shaping a future in which insights flow freely and privacy remains persistent.

Key Partnership Integrations

Databricks Clean Rooms are strengthened by key partnerships with industry-leading companies like Habu, Datavant, LiveRamp, and TransUnion. These partnerships are crucial pillars that elevate the data privacy and security capabilities of Databricks Clean Rooms. Let’s look at a few of these partnerships that are enabling enhanced data-driven insights:

Habu

Habu offers a software platform that empowers brands to construct clean rooms alongside their partners, extending the ability to measure marketing campaign impact across diverse channels and platforms. The uniform integration with Databricks amplifies user experience and cross-platform interoperability, opening avenues for insightful analyses that uphold data privacy standards. Habu offers a simple and intuitive interface, supports data sharing across different clouds and platforms, and enables privacy-preserving analytics with Databricks.

Datavant

Datavant offers innovative tokenization technology, a boon for healthcare organizations seeking to unleash the potential of data-driven healthcare analysis. The partnership provides a transformative capability through the integration of Datavant tokens within a Databricks clean room, allowing data to be linked and analyzed comprehensively without compromising patient privacy and regulatory compliance.

LiveRamp

The LiveRamp integration accentuates the importance of secure data connectivity. A data connectivity platform, LiveRamp enables media entities and advertisers to harness their data assets across the digital landscape without sacrificing data privacy, bolstering its capacity to enable effective advertising targeting while upholding the sanctity of data privacy. It allows users to share and analyze data across different clouds and platforms, while maintaining data privacy and compliance. It also enables users to perform advanced analytics, such as machine learning, on data from multiple sources.

TransUnion

TransUnion, a global information and insights powerhouse, adds a layer of risk assessment and decision-making capabilities to Databricks Clean Rooms. The integration with TransUnion enriches the arsenal of tools available to businesses, equipping them with accurate and insightful information for informed decision making, all the while safeguarding data privacy.

Collectively, these pivotal partnerships further strengthen the data privacy and security capabilities of Databricks Clean Rooms. By fusing cutting-edge technologies with a commitment to data protection, businesses can make impactful decisions, fortified by the assurance of stringent data privacy standards. In the pursuit of excellence in today’s data-centric landscape, Databricks Clean Rooms serve as trusted platforms. These platforms, strengthened by strategic partnerships, facilitate innovation while maintaining stringent data protection standards.

Industry Use Cases

Databricks Clean Rooms offer a wide range of applications across various industries. Their strong governance and data privacy enable collaborative data exploration in a secure environment, allowing multiple stakeholders to leverage their first-party data for analysis, while protecting proprietary information and upholding data privacy. The utility of Databricks Clean Rooms is demonstrated through diverse industry use cases, each showcasing the power of data sharing and collaboration within a framework that adheres to strict data privacy protocols:

Retail pioneering

The retail domain thrives on unified collaboration between retailers and suppliers, and Databricks Clean Rooms are reshaping the retail landscape by serving as secure conduits for confidential information exchange, underpinning demand forecasting, inventory planning, and supply chain optimization, elevating product availability, streamlining operations, and yield cost efficiencies.

Healthcare’s data nexus

In the healthcare sector, Databricks Clean Rooms are custodians of sensitive healthcare data. Collaborators seamlessly meld and query diverse data sources, culminating in a nuanced understanding for real-world evidence (RWE) applications. From regulatory decision making to clinical trial design and observational research, data privacy remains sacred, fostering an environment of ethical and secure innovation.

Media’s secure haven

Databricks Clean Rooms are catalyzing a transformative shift in the media industry by enabling secure sharing of audience data among media companies, advertisers, and partners, allowing for comprehensive analysis without infringing on user privacy and opening a new realm of collaborative insights, all within the confines of stringent data privacy regulations.

Financial integrity

In financial services, Databricks Clean Rooms align with the stringent Know Your Customer (KYC) standards. This collaboration augments the fight against financial malfeasance, facilitating comprehensive transaction investigations through collaborative analytics. In these secure environments, a holistic view of transactions materializes, allowing financial entities to address challenges with rigor and precision.

Driving automotive innovation

Databricks Clean Rooms are fostering synergy in the automotive industry by enabling secure data collaboration between manufacturers and suppliers, serving as crucial hubs for confidential data exchange and driving collaborative efforts in product development and supply chain optimization, all while ensuring the integrity of data.

Telecommunications unveiled

In the telecommunications sector, new paths of collaboration are being unlocked as carriers and service providers come together within secure environments to fuel cooperative efforts in optimizing networks and enhancing customer experiences, all while preserving the integrity of data privacy.

Public sector cohesion

Government agencies benefit from Databricks Clean Rooms by fostering a secure platform for data exchange with private sector counterparts, powering policy development and elevating service delivery within a secure and guarded environment.

Empowering energy

Energy companies and regulators connect within Databricks Clean Rooms to utilize secure data sharing for collaborative pursuits in energy production and distribution, steering the industry toward a sustainable future while upholding the sanctity of data privacy.

Educational advancement

Databricks Clean Rooms redefine educational collaboration, as academic institutions securely exchange insights with peers and government bodies, nurturing education policy development and service delivery while upholding the tenets of data privacy.

Organizations across various sectors are harnessing the power of data exploration, brought together by secure platforms. As they navigate this terrain, their commitment to the highest data privacy standards remains steadfast. This journey is steering innovation toward a future in which insights are seamlessly integrated and data protection is of the utmost importance.

Implementing Clean Rooms

In this section, you will learn about the details related to the implementation of Databricks Clean Rooms and how it provides a comprehensive set of tools to build, serve, and deploy a data clean room based on your data privacy and governance requirements. Before setting up a data clean room, clearly define your objectives and use cases. Ensure that you classify and segment your data based on sensitivity levels, access requirements, and compliance considerations. Also, plan for implementing robust security measures to protect the data clean room and the sensitive data it houses. The following section describes the key steps to implement Databricks Clean Rooms:

1. Set up the Databricks Data Intelligence Platform.

The first step in implementing a Databricks clean room is to set up the Databricks Data Intelligence Platform, which provides a comprehensive set of tools to build, serve, and deploy a scalable and flexible data clean room based on your data privacy and governance requirements.

2. Create a clean room and invite participants.

Next, create a clean room within the Databricks Data Intelligence Platform and specify the clean room participants. The isolated environment in which all jobs are executed is auto-created; no collaborator will be able to access the workspace for privacy/security reasons.

3. Share data.

Participants can share their data with the clean room by uploading it to the Databricks Data Intelligence Platform. Data is not stored in the clean room but is instead shared into the clean room. No users have direct access to the datasets. They can access them only through approved notebooks/code. Data sharing among collaborators is secure and private. Only table metadata is visible to collaborators, while raw data remains inaccessible and hidden.

4. Run computations.

Once data has been shared and access controls have been set up, participants can run computations on the data within the clean room. Computations can be performed using any language—SQL or Python—enabling simple use cases such as joins and crosswalks as well as complex computations such as machine learning.

Best Practices

In addition to setting up access controls within the clean room to ensure that only authorized users can access and process data, you should observe these best practices for using Databricks Clean Rooms to ensure data privacy and security:

Monitor data usage.

Monitor the usage of data within the clean room to ensure that the data is being used in compliance with data privacy and security policies. This can be done using tools such as audit logs and data usage reports.

Encrypt data.

Encrypt data at rest and in transit to ensure that it is protected from unauthorized access. This can be done using encryption tools provided by the Databricks Data Intelligence Platform or third-party encryption tools.

Implement data retention policies.

Implement data retention policies to ensure that data is not retained for longer than necessary. This can help to minimize the risk of data breaches and ensure compliance with data privacy regulations.

Regularly review and update security measures.

Regularly review and update security measures to ensure that they are effective in protecting data privacy and security. This can include updating access controls, monitoring tools, encryption tools, and data retention policies.

Define clear data sharing policies.

Define clear data sharing policies that outline the terms and conditions under which data can be shared within the clean room. This can help to ensure that all participants understand their rights and responsibilities when it comes to data sharing.

Provide training and support.

Provide training and support to clean room participants to help them understand how to use the clean room effectively. This can include training on how to share data, how to set up access controls, and how to run computations on the data within the clean room.

Leverage partner integrations.

Take advantage of partner integrations such as Habu, Datavant, LiveRamp, and TransUnion to enhance the capabilities of your Databricks Clean Room. These partners provide tools and technologies that can help you to improve data privacy and security within the clean room.

Regularly review and update data sharing agreements.

Regularly review and update data sharing agreements with clean room participants to ensure that they remain relevant and up to date. This can help to ensure that data sharing within the clean room remains compliant with data privacy regulations.

Summary

This chapter provided an understanding of Databricks Clean Rooms, an innovative technology designed to protect data while enabling effective collaboration and analysis. The concept of clean rooms was explained, illustrating how they offer a secure, governed, and privacy-safe environment for data analysis. You learned about key partnerships that enhance the capabilities of Databricks Clean Rooms, along with various industry use cases that demonstrate the versatility and applicability of clean rooms across sectors. You also learned about steps for getting started with and implementing Databricks Clean Rooms, along with best practices for using clean rooms effectively. Clean rooms have the potential to evolve by integrating advanced techniques, such as differential privacy, federated learning, or homomorphic encryption, for robust data privacy and security. They could also provide more granular and dynamic control over data access and usage, enabling participants to modify their data sharing policies in response to changing business needs or regulatory requirements. Furthermore, they could support more complex and collaborative scenarios.

Databricks Clean Rooms represent a significant advancement in data privacy and collaboration. They enable businesses to leverage their data effectively while ensuring compliance with privacy regulations. As you navigate the evolving landscape of data privacy, clean rooms will play an increasingly important role in enabling secure and effective data analysis. This makes it an exciting area to watch for anyone interested in data privacy and collaboration. In Chapter 5, you will shift your focus to the strategic aspect of data collaboration, and you’ll be guided through the process of developing an effective strategy for data collaboration.

1 Interactive Advertising Bureau (IAB), State of Data 2023: Data Clean Rooms and the Democratization of Data in the Privacy-Centric Ecosystem, January 24, 2023, https://oreil.ly/OBVgm.

Get Data Sharing and Collaboration with Delta Sharing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.