Create clean rooms

Important

This feature is in Public Preview. To request access, reach out to your Azure Databricks representative.

This article describes how to create a clean room, a secure and privacy-protecting environment where multiple parties can work together on sensitive enterprise data without direct access to each other’s data.

Before you begin

The privileges needed to use clean rooms vary depending on the task:

  • To create a clean room, you must have the CREATE CLEAN ROOM privilege or be a metastore admin. The creator is automatically assigned as the owner of the clean room in their Unity Catalog metastore.

  • To initiate participation in a clean room that is shared with you, you must be a metastore admin.

    When a clean room is shared, the collaborator organization’s metastore admin is automatically assigned ownership of the clean room. The metastore admin can reassign ownership to a non-metastore admin. As a data governance best practice, Databricks recommends that ownership be assigned to a group.

    If your workspace does not have a metastore admin assigned, you must assign the role. See Assign a metastore admin and Manage Unity Catalog object ownership.

  • To add and remove data assets and notebooks in a clean room you must be the owner of the clean room or have the MODIFY CLEAN ROOM privilege on the clean room. Additionally, you and the owner of the clean room (if you are not the owner) must have SELECT on tables that you add and READ VOLUME on volumes that you add.

To learn about permission requirements for updating clean rooms and running tasks (notebooks) in clean rooms, see Manage clean rooms and Run notebooks in clean rooms.

You can create up to five clean rooms per metastore.

Step 1. Request the collaborator’s sharing identifier

Before you can create a clean room, you must have the Clean Room sharing identifier of the organization that you will be collaborating with. The sharing identifier is a string that consists of the organization’s global metastore ID + workspace ID + the contact’s username (email address). The collaborator can be in any cloud or region.

Reach out to the collaborator to request their sharing identifier.

The collaborator can get the sharing identifier using the instructions in Find your sharing identifier.

Step 2. Create a clean room

To create a clean room, you must use Catalog Explorer.

  1. In your Azure Databricks workspace, click Catalog icon Catalog.

  2. On the Quick access page, click the Clean Rooms > button.

    Alternatively, click the Gear icon gear icon at the top of the Catalog pane and select Clean Rooms.

  3. Click Create Clean Room.

  4. On the Create Clean Room page, enter a user-friendly name for the clean room.

    The name cannot use spaces, periods, or forward slashes (/).

    You cannot change the clean room name once it’s saved. Use a name that the collaborator will find useful and descriptive.

  5. Select the cloud provider and region where the central clean room will be created.

    The cloud provider must be the same as your current workspace, but the region does not. Consider your organization’s data residency or other policies when you make your selection.

  6. (Optional) Add a comment.

  7. Enter the collaborator’s Clean Room sharing identifier.

    See Step 1. Request the collaborator’s sharing identifier.

  8. Make note of the catalog names assigned to you (the creator) and the collaborator.

    All data assets added to the clean room will appear under that catalog in the central clean room, and can be referenced using that catalog in the Unity Catalog three-level namespace (<catalog>.<schema>.<table-etc>).

  9. Click Create Clean Room.

Step 3. Add data assets and notebooks to the clean room

Either party in the clean room (the creator and the collaborator) can add tables, volumes, and notebooks to the clean room.

Permissions required:

  • You must be the owner or have the MODIFY CLEAN ROOM privilege on the clean room.

  • You and the clean room owner (if you are not the owner) must have SELECT on any table and READ VOLUME on any volume that you add, along with USE CATALOG and USE SCHEMA on the parent catalog and schema.

    The clean room owner must keep these privileges throughout the life of the clean room.

Note

The instructions that follow assume that you are returning to an already-created clean room to add assets. If you just created a clean room for the first time, a wizard walks you through the addition of data assets and notebooks. The actual UI for adding these assets is the same, regardless of whether you are guided by the wizard or not.

To add assets:

  1. In your Azure Databricks workspace, click Catalog icon Catalog.

  2. On the Quick access page, click the Clean Rooms > button.

    Alternatively, click the Gear icon gear icon at the top of the Catalog pane and select Clean Rooms.

  3. Find and click the name of the clean room you want to update.

  4. To add data assets (tables and volumes), click the + Add data assets button.

  5. Select the tables and volumes you want to share and click Add data assets.

    When you share a table or volume, you can optionally add an alias. The alias name will be the only name visible in the clean room.

    When you share a table, you can optionally add partition clauses that enable you to share only part of the table. For details about how to use partitions to limit what you share, see Specify table partitions to share.

  6. To add notebooks, click the + Add notebooks button and browse for the notebook you want to add.

    You can optionally give the notebook an alternative Notebook name.

    Notebooks that you share in clean rooms query data and run data analysis workloads on the tables and volumes that you and the other collaborator have added to the clean room.

    Notebooks operate on the principle of implicit approval: you cannot run notebooks that you create. You create the notebooks that your collaborator uses, and your collaborator creates the notebooks that you use.

    If you share a notebook that includes results, those results will be shared with your collaborator.

    Important

    Any notebook references to tables or volumes that were added to the clean room must use the catalog name assigned when the clean room was created (“creator” for data assets added by the clean room creator, and “collaborator” for data assets added by the invited collaborator). For example, a table added by the creator could be named creator.sales.california.

    Likewise, ensure that the notebook uses any aliases that were assigned to data assets in the clean room.