Share data securely using Delta Sharing
This article introduces Delta Sharing in Azure Databricks, the secure data sharing platform that lets you share data in Azure Databricks with users outside your organization.
The Delta Sharing articles on this site focus on sharing Azure Databricks data and notebooks. Delta Sharing is also available as an open-source project that you can use to share Delta tables from other platforms. Delta Sharing also provides the backbone for Databricks Marketplace, an open forum for exchanging data products.
If you are a data recipient who has been granted access to shared data through Delta Sharing, and you just want to learn how to access that data, see Access data shared with you using Delta Sharing.
What is Delta Sharing?
Delta Sharing is an open protocol developed by Databricks for secure data sharing with other organizations regardless of the computing platforms they use. Azure Databricks builds Delta Sharing into its Unity Catalog data governance platform, enabling an Azure Databricks user, called a data provider, to share data with a person or group outside of their organization, called a data recipient.
Delta Sharing’s native integration with Unity Catalog allows you to manage, govern, audit, and track usage of the shared data on one platform. In fact, your data must be registered in Unity Catalog to be available for secure sharing. Data must also be in the Delta table format.
The primary concepts underlying Delta Sharing in Azure Databricks are shares and recipients.
What is a share?
In Delta Sharing, a share is a read-only collection of tables and table partitions to be shared with one or more recipients. If your recipient uses a Unity Catalog-enabled Databricks workspace, you can also include notebook files in a share.
A share is a securable object registered in Unity Catalog. A share can contain tables, views, and notebook files from a single Unity Catalog metastore. You can add or remove tables, views, and notebook files from a share at any time, and you can assign or revoke data recipient access to a share at any time.
If you remove a share from your Unity Catalog metastore, all recipients of that share lose the ability to access it.
What is a recipient?
A recipient is an object that associates an organization with a credential or secure sharing identifier that allows that organization to access one or more shares.
As a data provider (sharer), you can define multiple recipients for any given Unity Catalog metastore, but if you want to share data from multiple metastores with a particular user or group of users, you must define the recipient separately for each metastore. A recipient can have access to multiple shares.
If you delete a recipient from your Unity Catalog metastore, that recipient loses access to all shares it could previously access.
The way you use Delta Sharing depends on who you are sharing data with:
- Open sharing lets you share data with any user, whether or not they have access to Azure Databricks.
- Databricks-to-Databricks sharing lets you share data with Azure Databricks users who have access to a Unity Catalog metastore that is different from yours. Databricks-to-Databricks also supports notebook sharing, which is not available in open sharing.
If you want to share data with users outside of your Azure Databricks workspace, regardless of whether they use Databricks, you can use open Delta Sharing to share your data securely. As a data provider, you generate a token and share it securely with the recipient. They use the token to authenticate and get read access to the tables you’ve included in the shares you’ve given them access to.
Recipients can access the shared data using many computing tools and platforms, including:
- Azure Databricks
- Apache Spark
- Power BI
For a full list of Delta Sharing connectors and information about how to use them, see the Delta Sharing documentation.
If you want to share data with users who don’t have access to your Unity Catalog metastore, you can use Databricks-to-Databricks Delta Sharing, as long as the recipients have access to a Databricks workspace that is enabled for Unity Catalog. Databricks-to-Databricks sharing lets you share data with users in other Databricks accounts, whether they’re on AWS, Azure, or GCP. It’s also a great way to securely share data across different Unity Catalog metastores in your own Databricks account.
One advantage of this scenario is that the share recipient doesn’t need a token to access the share, and the provider doesn’t need to manage recipient tokens. The security of the sharing connection—including all identity verification, authentication, and auditing—is managed entirely through Delta Sharing and the Databricks platform. Another advantage is the ability to share Databricks notebook files.
How do admins set up Delta Sharing?
Databricks-to-Databricks sharing between Unity Catalog metastores in the same account is always enabled. To enable Delta Sharing to share data with Databricks workspaces in other accounts or non-Databricks clients, an Azure Databricks account admin or metastore admin performs the following setup steps (at a high level):
Enable Delta Sharing for the Unity Catalog metastore that manages the data you want to share.
Create a share that includes one or more tables in the metastore.
If you plan to use Databricks-to-Databricks sharing, you can also add notebook files to a share.
Create a recipient.
If your recipient is not a Databricks user, or does not have access to a Databricks workspace that is enabled for Unity Catalog, you must use open sharing. A set of token-based credentials is generated for that recipient.
If your recipient has access to a Databricks workspace that is enabled for Unity Catalog, you can use Databricks-to-Databricks sharing, and no token-based credentials are required. You request a sharing identifier from the recipient and use it to establish the secure connection.
Use yourself as a test recipient to try out the setup process.
Grant the recipient access to one or more shares.
This step can also be performed by a non-admin user with the
SET SHARE PERMISSIONprivileges. See Unity Catalog privileges and securable objects.
Send the recipient the information they need to connect to the share (open sharing only).
For open sharing, use a secure channel to send the recipient an activation link that allows them to download their token-based credentials.
For Databricks-to-Databricks sharing, the data included in the share becomes available in the recipient’s Databricks workspace as soon as you grant them access to the share.
The recipient can now access the shared data.
How do recipients access the shared data?
Recipients access shared tables in read-only format. Shared notebook files are read-only, but they can be cloned and then modified and run in the recipient workspace just like any other notebook.
Secure access depends on the sharing model:
- Open sharing: The recipient provides the credential whenever they access the data in their tool of choice, including Apache Spark, pandas, Power BI, Databricks, and many more. See Read data shared using Delta Sharing open sharing.
- Databricks-to-Databricks: The recipient accesses the data using Databricks. They can use Unity Catalog to grant and deny access to other users in their Databricks account. See Read data shared using Databricks-to-Databricks Delta Sharing.
Whenever the data provider updates data tables in their own Databricks account, the updates appear in near real time in the recipient’s system.
How do you keep track of who is sharing and accessing shared data?
Data providers can use Azure Databricks audit logging to monitor the creation and modification of shares and recipients, and can monitor recipient activity on shares. See Audit and monitor data sharing using Delta Sharing (for providers).
Data recipients who use shared data in a Databricks account can use Databricks audit logging to understand who is accessing which data. See Audit and monitor data access using Delta Sharing (for recipients).
Delta Sharing and streaming
Delta Sharing supports Spark Structured Streaming. A provider can share a table with history so that a recipient can use it as a Structured Streaming source, processing shared data incrementally with low latency. Recipients can also perform Delta Lake time travel queries on tables shared with history.
To learn how to share tables with history, see Add tables to a share. To learn how to use shared tables as streaming sources, see Query a table using Apache Spark Structured Streaming (for recipients of Databricks-to-Databricks sharing) or Access a shared table using Spark Structured Streaming (for recipients of open sharing data).
See also Streaming on Azure Databricks.
- Only tables and views stored in a Unity Catalog metastore can be shared using Delta Sharing.
- Only tables in Delta format are supported. You can easily convert Parquet tables to Delta—and back again. See CONVERT TO DELTA.
- View sharing is supported only in Databricks-to-Databricks sharing. Shareable views must be defined on Delta tables or other shareable views. For details, see (for providers) Add views to a share and (for consumers) Read shared views.
- There are limits on the number of files in metadata allowed for a shared table. To learn more, see Resource limit exceeded errors.
- Schemas named
information_schemacannot be imported into a Unity Catalog metastore, because that schema name is reserved in Unity Catalog.
The values below indicate the quotas for Delta Sharing resources.
If you expect to exceed these resource limits, contact your Azure Databricks account representative.