What is Azure Databricks Clean Rooms?
Important
This feature is in Public Preview.
This article introduces Clean Rooms, an Azure Databricks feature that uses Delta Sharing and serverless compute to provide a secure and privacy-protecting environment where multiple parties can work together on sensitive enterprise data without direct access to each other’s data.
Requirements
To be eligible to use clean rooms, you must:
- Have an account that is enabled for serverless compute. See Enable serverless compute.
- Have a workspace that is enabled for Unity Catalog. See Enable a workspace for Unity Catalog.
How does Clean Rooms work?
When you create a clean room, you create the following:
- A securable clean room object in your Unity Catalog metastore.
- The “central” clean room, which is an isolated ephemeral environment managed by Databricks.
- A securable clean room object in your collaborator’s Unity Catalog metastore.
Tables, volumes (non-tabular data), and notebooks that either collaborator shares in the clean room are shared with the central clean room only, using Delta Sharing.
Collaborators cannot see the data in other collaborators’ tables and volumes, but they can see column names and column types, and they can run approved notebook code that operates over the tables and volumes. The notebook code runs in the central clean room. Notebooks can also generate output tables that let your collaborator temporarily save read-only output to their Unity Catalog metastore so they can work with it in their workspaces.
How does Clean Rooms ensure a no-trust environment?
The Databricks Clean Rooms model is “no-trust.” All collaborators in a no-trust clean room have equal privileges, including the creator of the clean room. Clean Rooms is designed to prevent the running of unauthorized code and the unauthorized sharing of data. For example, all collaborators must approve a notebook before it can be run. This trust is enforced implicitly by preventing a collaborator from running any notebook that they have created themselves: you can only run a notebook created by the other collaborator.
Additional safeguards or restrictions
The following safeguards are in place in addition to the implicit notebook approval process mentioned above:
After a clean room is created, it is locked down to prevent new collaborators from joining the clean room.
If any collaborator deletes the clean room, the central clean room is void and no clean room tasks can be run by any user.
During the public preview, each clean room is limited to two collaborators.
You cannot rename the clean room.
The clean room name must be unique in every collaborator’s metastore, so that all collaborators can refer to the same clean room unambiguously.
Comments on the clean room securable in each collaborator’s workspace are not propagated to other collaborators.
What is shared with other collaborators?
- Clean room name.
- Cloud and region of the central clean room.
- Your organization name (which can be any name you choose).
- Your clean room sharing identifier (global metastore ID + workspace ID + user email address).
- Aliases of shared tables or volumes.
- Column metadata (column name or alias and type).
- Notebooks (read-only).
- Output tables (read-only, temporary).
- Clean room events system table.
- Run history, including:
- The name of the notebook that is being run
- Collaborator that ran the notebook (not user).
- The state of the notebook run.
- The start time of the notebook run.
What is shared with the central clean room?
Everything that is listed in the previous section.
Read-only tables, volumes, and notebooks.
Tables and volumes are registered in the central clean room’s metastore with any supplied aliases. Tables, volumes, and notebooks are shared throughout the lifecycle of the clean room.
Limitations
During the public preview, the following limitations apply:
- No service credential Scala libraries included in the required Databricks Runtime version.
Resource quotas
Azure Databricks enforces resource quotas on all Clean Room securable objects. These quotas are listed in Resource limits. If you expect to exceed these resource limits, contact your Azure Databricks account team.
You can monitor your quota usage using the Unity Catalog resource quotas APIs. See Monitor your usage of Unity Catalog resource quotas.