Κοινή χρήση μέσω


What is Delta Sharing?

This article introduces Delta Sharing in Azure Databricks, the secure data sharing platform that lets you share data and AI assets in Azure Databricks with users outside your organization, whether those users use Databricks or not. Delta Sharing is also the basis for Databricks Marketplace, an open forum for exchanging data products, and Clean Rooms, a secure and privacy-protecting environment where multiple parties can work together on sensitive enterprise data.

Delta Sharing is also available as an open-source project that you can use to share Delta tables from other platforms.

Note

To learn how to access data that has been shared with you using Delta Sharing, see Access data shared with you using Delta Sharing (for recipients).

How does Delta Sharing work?

Delta Sharing is an open protocol developed by Databricks for secure data sharing with other organizations regardless of the computing platforms they use.

There are three ways to share data using Delta Sharing:

  1. The Databricks-to-Databricks sharing protocol, which lets you share data and AI assets from your Unity Catalog-enabled workspace with users who also have access to a Unity Catalog-enabled Databricks workspace.

    This approach uses the Delta Sharing server that is built into Azure Databricks. It supports some Delta Sharing features that are not suppported in the other protocols, including notebook sharing, Unity Catalog volume sharing, Unity Catalog AI model sharing, Unity Catalog data governance, auditing, and usage tracking for both providers and recipients. The integration with Unity Catalog simplifies setup and governance for both providers and recipients and improves performance.

    See Share data using the Delta Sharing Databricks-to-Databricks protocol (for providers).

  2. The Databricks open sharing protocol, which lets you share tabular data that you manage in a Unity Catalog-enabled Databricks workspace with users on any computing platform.

    This approach uses the Delta Sharing server that is built into Azure Databricks and is useful when you manage data using Unity Catalog and want to share it with users who don’t use Databricks or don’t have access to a Unity Catalog-enabled Databricks workspace. The integration with Unity Catalog on the provider side simplifies setup and governance for providers.

    See Share data using the Delta Sharing open sharing protocol (for providers).

  3. A customer-managed implementation of the open-source Delta Sharing server, which lets you share from any platform to any platform, whether Databricks or not.

    The Azure Databricks documentation does not cover instructions for setting up your own Delta Sharing server. See github.com/delta-io/delta-sharing.

Shares, providers, and recipients

The primary concepts underlying Delta Sharing in Azure Databricks are shares, providers, and recipients.

What is a share?

In Delta Sharing, a share is a read-only collection of tables and table partitions that a provider wants to share with one or more recipients. If your recipient uses a Unity Catalog-enabled Databricks workspace, you can also include notebook files, views (including dynamic views that restrict access at the row and column level), Unity Catalog volumes, and Unity Catalog models in a share.

You can add or remove tables, views, volumes, models, and notebook files from a share at any time, and you can assign or revoke data recipient access to a share at any time.

In a Unity Catalog-enabled Azure Databricks workspace, a share is a securable object registered in Unity Catalog. If you remove a share from your Unity Catalog metastore, all recipients of that share lose the ability to access it.

See Create and manage shares for Delta Sharing.

What is a provider?

A provider is an entity that shares data with a recipient. If you are a provider and you want to take advantage of the built-in Databricks Delta Sharing server and manage shares and recipients using Unity Catalog, you need at least one Azure Databricks workspace that is enabled for Unity Catalog. You do not need to migrate all of your existing workspaces to Unity Catalog. You can simply create a new Unity Catalog-enabled workspace for your Delta Sharing needs.

If a recipient is on a Unity Catalog-enabled Databricks workspace, the provider is also a Unity Catalog securable object that represents the provider organization and associates that organization with a set of shares.

What is a recipient?

A recipient is an entity that receives shares from a provider. In Unity Catalog, a share is a securable object that represents an organization and associates it with a credential or secure sharing identifier that allows that organization to access one or more shares.

As a data provider (sharer), you can define multiple recipients for any given Unity Catalog metastore, but if you want to share data from multiple metastores with a particular user or group of users, you must define the recipient separately for each metastore. A recipient can have access to multiple shares.

If a provider deletes a recipient from their Unity Catalog metastore, that recipient loses access to all shares it could previously access.

See Create and manage data recipients for Delta Sharing.

Open sharing versus Databricks-to-Databricks sharing

This section describes the two protocols for sharing from a Databricks workspace that is enabled for Unity Catalog.

Note

This section assumes that the provider is on a Unity Catalog-enabled Azure Databricks workspace. To learn about setting up an open-source Delta Sharing server to share from a non-Databricks platform or non-Unity Catalog workspace, see github.com/delta-io/delta-sharing.

The way a provider uses Delta Sharing in Azure Databricks depends on who they are sharing data with:

  • Open sharing lets you share data with any user, whether or not they have access to Azure Databricks.
  • Databricks-to-Databricks sharing lets you share data with Azure Databricks users whose workspace is attached to a Unity Catalog metastore that is different from yours. Databricks-to-Databricks also supports notebook, volume, and model sharing, which is not available in open sharing.

What is open Delta Sharing?

If you want to share data with users outside of your Azure Databricks workspace, regardless of whether they use Databricks, you can use open Delta Sharing to share your data securely. As a data provider, you generate a token and share it securely with the recipient. They use the token to authenticate and get read access to the tables you’ve included in the shares you’ve given them access to.

Recipients can access the shared data using many computing tools and platforms, including:

  • Azure Databricks
  • Apache Spark
  • Pandas
  • Power BI

For a full list of Delta Sharing connectors and information about how to use them, see the Delta Sharing documentation.

See also Share data using the Delta Sharing open sharing protocol (for providers).

What is Databricks-to-Databricks Delta Sharing?

If you want to share data with users who have a Databricks workspace that is enabled for Unity Catalog, you can use Databricks-to-Databricks Delta Sharing. Databricks-to-Databricks sharing lets you share data with users in other Databricks accounts, whether they’re on AWS, Azure, or GCP. It’s also a great way to securely share data across different Unity Catalog metastores in your own Databricks account. Note that there is no need to use Delta Sharing to share data between workspaces attached to the same Unity Catalog metastore, because in that scenario you can use Unity Catalog itself to manage access to data across workspaces.

One advantage of Databricks-to-Databricks sharing is that the share recipient doesn’t need a token to access the share, and the provider doesn’t need to manage recipient tokens. The security of the sharing connection—including all identity verification, authentication, and auditing—is managed entirely through Delta Sharing and the Databricks platform. Another advantage is the ability to share Databricks notebook files, views, Unity Catalog volumes, and Unity Catalog models.

See also Share data using the Delta Sharing Databricks-to-Databricks protocol (for providers).

How do provider admins set up Delta Sharing?

This section gives an overview of how providers can enable Delta Sharing and initiate sharing from a Unity Catalog-enabled Azure Databricks workspace. For open-source Delta Sharing, see github.com/delta-io/delta-sharing.

Databricks-to-Databricks sharing between Unity Catalog metastores in the same account is always enabled. If you are a provider who wants to enable Delta Sharing to share data with Databricks workspaces in other accounts or non-Databricks clients, an Azure Databricks account admin or metastore admin performs the following setup steps (at a high level):

  1. Enable Delta Sharing for the Unity Catalog metastore that manages the data you want to share.

    Note

    You do not need to enable Delta Sharing on your metastore if you intend to use Delta Sharing to share data only with users on other Unity Catalog metastores in your account. Metastore-to-metastore sharing within a single Azure Databricks account is enabled by default.

    See Enable Delta Sharing on a metastore.

  2. Create a share that includes data assets registered in the Unity Catalog metastore.

    If you are sharing with a non-Databricks recipient (known as open sharing) you can include tables in the Delta or Parquet format. If you plan to use Databricks-to-Databricks sharing, you can also add views, Unity Catalog volumes, Unity Catalog models, and notebook files to a share.

    See Create and manage shares for Delta Sharing.

  3. Create a recipient.

    See Create and manage data recipients for Delta Sharing.

    If your recipient is not a Databricks user, or does not have access to a Databricks workspace that is enabled for Unity Catalog, you must use open sharing. A set of token-based credentials is generated for that recipient.

    If your recipient has access to a Databricks workspace that is enabled for Unity Catalog, you can use Databricks-to-Databricks sharing, and no token-based credentials are required. You request a sharing identifier from the recipient and use it to establish the secure connection.

    Tip

    Use yourself as a test recipient to try out the setup process.

  4. Grant the recipient access to one or more shares.

    See Manage access to Delta Sharing data shares (for providers).

    Note

    This step can also be performed by a non-admin user with the USE SHARE, USE RECIPIENT and SET SHARE PERMISSION privileges. See Unity Catalog privileges and securable objects.

  5. Send the recipient the information they need to connect to the share (open sharing only).

    See Send the recipient their connection information.

    For open sharing, use a secure channel to send the recipient an activation link that allows them to download their token-based credentials.

    For Databricks-to-Databricks sharing, the data included in the share becomes available in the recipient’s Databricks workspace as soon as you grant them access to the share.

The recipient can now access the shared data.

How do recipients access the shared data?

Recipients access shared data assets in read-only format. Shared notebook files are read-only, but they can be cloned and then modified and run in the recipient workspace just like any other notebook.

Secure access depends on the sharing model:

Whenever the data provider updates data tables or volumes in their own Databricks account, the updates appear in near real time in the recipient’s system.

How do you keep track of who is sharing and accessing shared data?

Data providers on Unity Catalog-enabled Azure Databricks workspaces can use Azure Databricks audit logging and system tables to monitor the creation and modification of shares and recipients, and can monitor recipient activity on shares. See Audit and monitor data sharing.

Data recipients who use shared data in a Databricks workspace can use Databricks audit logging and system tables to understand who is accessing which data. See Audit and monitor data sharing.

Sharing volumes

You can share volumes using the Databricks-to-Databricks sharing flow. See Add volumes to a share (for providers) and Read data shared using Databricks-to-Databricks Delta Sharing (for recipients) (for recipients).

Sharing models

You can share models using the Databricks-to-Databricks sharing flow. See Add models to a share (for providers) and Read data shared using Databricks-to-Databricks Delta Sharing (for recipients) (for recipients).

Sharing notebooks

You can use Delta Sharing to share notebook files using the Databricks-to-Databricks sharing flow. See Add notebook files to a share (for providers) and Read shared notebooks (for recipients).

Restricting access at the row and column level

You can share dynamic views that restrict access to certain table data based on recipient properties. Dynamic view sharing requires the Databricks-to-Databricks sharing flow. See Add dynamic views to a share to filter rows and columns.

Delta Sharing and streaming

Delta Sharing supports Spark Structured Streaming. A provider can share a table with history so that a recipient can use it as a Structured Streaming source, processing shared data incrementally with low latency. Recipients can also perform Delta Lake time travel queries on tables shared with history.

To learn how to share tables with history, see Add tables to a share. To learn how to use shared tables as streaming sources, see Query a table using Apache Spark Structured Streaming (for recipients of Databricks-to-Databricks sharing) or Access a shared table using Spark Structured Streaming (for recipients of open sharing data).

See also Streaming on Azure Databricks.

Delta Lake feature support matrix

Delta Sharing supports most Delta Lake features when you share a table. This support matrix lists:

  • Delta features that require specific versions of Databricks Runtime, the open-source Delta Sharing Spark connector, or the open-source Delta Sharing Python connector.
  • Partially supported features.
Feature Provider Databricks recipient Open source recipient
Deletion vectors Sharing tables with this feature is in Public Preview. - Databricks Runtime 14.1+ for batch queries
- Databricks Runtime 14.2+ for CDF and streaming queries
- Delta Sharing Spark connector 3.1+
- Delta Sharing Python connector 1.1.0+
- Power BI v2.132.908.0+
Column mapping Sharing tables with this feature is in Public Preview. - Databricks Runtime 14.1+ for batch queries
- Databricks Runtime 14.2+ for CDF and streaming queries
- Delta Sharing Spark connector 3.1+
- Delta Sharing Python connector 1.1.0+
- Power BI v2.132.908.0+
Uniform format Sharing tables with this feature is in Public Preview. - Databricks Runtime 14.1+ for batch queries
- Databricks Runtime 14.2+ for CDF and streaming queries
- Delta Sharing Spark connector 3.1+
- Delta Sharing Python connector 1.1.0+
- Power BI v2.132.908.0+
V2 checkpoint Supported with limitations Supported with limitations Supported with limitations
TimestampNTZ Supported Databricks Runtime 14.1+ Delta Sharing Spark connector 3.3+
Liquid clustering Supported with limitations Supported with limitations Supported with limitations

Delta Sharing FAQs

The following are frequently asked questions about Delta Sharing.

Do I need Unity Catalog to use Delta Sharing?

No, you do not need Unity Catalog to share (as a provider) or consume shared data (as a recipient). However, Unity Catalog provides benefits such as support for non-tabular and AI asset sharing, out-of-the-box governance, simplicity, and query performance.

Providers can share data in two ways:

  • Put the assets to share under Unity Catalog management and share them using the built-in Azure Databricks Delta Sharing server.

    You do do not need to migrate all assets to Unity Catalog. You need only one Azure Databricks workspace that is enabled for Unity Catalog to manage assets that you want to share. In some accounts, new workspaces are enabled for Unity Catalog automatically. See Automatic enablement of Unity Catalog.

  • Implement the open Delta Sharing server to share data, without necessarily using your Azure Databricks account.

Recipients can consume data in two ways:

  • Without a Databricks workspace. Use open source Delta Sharing connectors that are available for many data platforms, including Power BI, pandas, and open source Apache Spark. See Read data shared using Delta Sharing open sharing (for recipients) and the Delta Sharing open source project.

  • In a Databricks workspace. Recipient workspaces don’t need to be enabled for Unity Catalog, but there are advantages of governance, simplicity, and performance if they are.

    Recipient organizations who want these advantages don’t need to migrate all assets to Unity Catalog. You need only one Azure Databricks workspace that is enabled for Unity Catalog to manage assets that are shared with you. In some accounts, new workspaces are enabled for Unity Catalog automatically. See Automatic enablement of Unity Catalog.

See Read data shared using Delta Sharing open sharing (for recipients) and Read data shared using Databricks-to-Databricks Delta Sharing (for recipients).

Do I need to be a Databricks customer to use Delta Sharing?

No, Delta Sharing is an open protocol. You can share non-Databricks data with recipients on any data platform. Providers can configure an open Delta Sharing server to share from any computing platform. Recipients can consume shared data using open source Delta Sharing connectors for many data products, including Power BI, pandas, and open source Spark.

However, using Delta Sharing on Azure Databricks, especially sharing from a Unity Catalog-enabled workspace, has many advantages.

For details, see the first question in this FAQ.

Does Delta Sharing incur egress costs?

Delta Sharing within a region incurs no egress cost. Unlike other data sharing platforms, Delta Sharing does not require data replication. This model has many advantages, but it means that your cloud vendor may charge data egress fees when you share data across clouds or regions. Azure Databricks supports sharing from Cloudflare R2, which incurs no egress fees, and provides other tools and recommendations to monitor and avoid egress fees. See Monitor and manage Delta Sharing egress costs (for providers).

Can providers revoke recipient access?

Yes, recipient access can be revoked on-demand and at specified levels of granularity. You can deny recipient access to specific shares and specific IP addresses, filter tabular data for a recipient, revoke recipient tokens, and delete recipients entirely. See Revoke recipient access to a share and Create and manage data recipients for Delta Sharing.

Isn’t it insecure to use pre-signed URLs?

Delta Sharing uses pre-signed URLs to provide temporary access to a file in object storage. They are only given to recipients that already have access to the shared data. They are secure because they are short-lived and don’t expand the level of access beyond what recipients have already been granted.

Are the tokens used in the Delta Sharing open sharing protocol secure?

Because Delta Sharing enables cross-platform sharing—unlike other available data sharing platforms—the sharing protocol requires an open token. Providers can ensure token security by configuring the token lifetime, setting networking controls, and revoking access on demand. In addition, the token does not expand the level of access beyond what recipients have already been granted. See Security considerations for tokens.

If you prefer not to use tokens to manage access to recipient shares, you should use Databricks-to-Databricks sharing or contact your Databricks account team for alternatives.

Does Delta Sharing support view sharing?

Yes, Delta Sharing supports view sharing. See Add views to a share.

To learn about planned enhancements to viewing sharing, contact your Databricks account team.

Limitations

See also Delta Lake feature support matrix.

Resource quotas

Azure Databricks enforces resource quotas on all Delta Sharing securable objects. These quotas are listed in Resource limits. If you expect to exceed these resource limits, contact your Azure Databricks account team.

You can monitor your quota usage using the Unity Catalog resource quotas APIs. See Monitor your usage of Unity Catalog resource quotas.

Next steps