Nota
Capaian ke halaman ini memerlukan kebenaran. Anda boleh cuba mendaftar masuk atau menukar direktori.
Capaian ke halaman ini memerlukan kebenaran. Anda boleh cuba menukar direktori.
This page provides an overview of using the Unity REST API to access Unity Catalog managed and external tables from external Delta Lake clients. To create external Delta Lake tables from external clients, see Create external Delta tables from external clients.
Use the Iceberg REST catalog to read Unity Catalog-registered tables on Azure Databricks from supported Iceberg clients, including Apache Spark and DuckDB.
For a full list of supported integrations, see Unity Catalog integrations.
Tip
For information about how to read Azure Databricks data using Microsoft Fabric, see Use Microsoft Fabric to read data that is registered in Unity Catalog.
Read and write using the Unity REST API
The Unity REST API provides external clients read access to tables registered to Unity Catalog. Some clients also support creating tables and writing to existing tables.
Configure access using your workspace URL. The Unity Catalog Spark client automatically routes requests to the appropriate Unity Catalog API endpoint.
Important
The workspace URL used for the Unity REST API endpoint must include the workspace ID. Without the workspace ID, API requests may return a 303 redirect to a login page instead of the expected response.
To find your workspace URL and workspace ID, see Workspace instance names, URLs, and IDs.
Requirements
Azure Databricks supports Unity REST API access to tables as part of Unity Catalog. You must have Unity Catalog enabled in your workspace to use these endpoints. The following table types are eligible for Unity REST API reads:
- Unity Catalog managed tables.
- Unity Catalog external tables.
You must complete the following configuration steps to configure access to read Azure Databricks objects from Delta Lake clients using the Unity REST API:
- Enable External data access for your metastore. See Enable external data access on the metastore.
- Grant the principal configuring the integration the
EXTERNAL USE SCHEMAprivilege on the schema containing the objects. See Grant a principal Unity Catalog privileges. - Authenticate using one of the following methods:
- OAuth machine-to-machine (M2M) authentication: Supports automatic credential and token refresh for long-running Spark jobs (>1 hour). See Authorize service principal access to Azure Databricks with OAuth.
- Personal access token (PAT): See Authorize access to Azure Databricks resources.
Read Delta Lake tables with Apache Spark using OAuth authentication
Azure Databricks supports OAuth machine-to-machine (M2M) authentication. OAuth automatically handles token renewal for Unity Catalog authentication. For long-running jobs that also require automatic cloud storage credential renewal, enable the spark.sql.catalog.<uc-catalog-name>.renewCredential.enabled setting in your Spark configuration.
OAuth authentication for external Apache Spark clients requires:
- Unity Catalog Spark client version 0.3.1 or later (
io.unitycatalog:unitycatalog-spark) - Apache Spark 4.0 or later
- Delta Lake Spark 4.0.1 or later with OAuth support
- An OAuth M2M service principal with appropriate permissions. See Authorize service principal access to Azure Databricks with OAuth.
The following configuration is required to read Unity Catalog managed tables and external Delta Lake tables with Apache Spark using OAuth authentication:
"spark.sql.extensions": "io.delta.sql.DeltaSparkSessionExtension",
"spark.sql.catalog.spark_catalog": "io.unitycatalog.spark.UCSingleCatalog",
"spark.sql.catalog.<uc-catalog-name>": "io.unitycatalog.spark.UCSingleCatalog",
"spark.sql.catalog.<uc-catalog-name>.uri": "<workspace-url>",
"spark.sql.catalog.<uc-catalog-name>.auth.type": "oauth",
"spark.sql.catalog.<uc-catalog-name>.auth.oauth.uri": "<oauth-token-endpoint>",
"spark.sql.catalog.<uc-catalog-name>.auth.oauth.clientId": "<oauth-client-id>",
"spark.sql.catalog.<uc-catalog-name>.auth.oauth.clientSecret": "<oauth-client-secret>",
"spark.sql.catalog.<uc-catalog-name>.renewCredential.enabled": "true",
"spark.sql.defaultCatalog": "<uc-catalog-name>",
"spark.jars.packages": "io.delta:delta-spark_2.13:4.0.1,io.unitycatalog:unitycatalog-spark_2.13:0.3.1,org.apache.hadoop:hadoop-azure:3.3.6"
Substitute the following variables:
<uc-catalog-name>: The name of the catalog in Unity Catalog that contains your tables.<oauth-token-endpoint>: OAuth token endpoint URL. To construct this URL:- Locate your Azure Databricks account ID. See Locate your account ID.
- Use the format:
https://accounts.cloud.databricks.com/oidc/accounts/<account-id>/v1/token
<oauth-client-id>: OAuth client ID for your service principal. See Authorize service principal access to Azure Databricks with OAuth.<oauth-client-secret>: OAuth client secret for your service principal. See Authorize service principal access to Azure Databricks with OAuth.
<workspace-url>: The Azure Databricks workspace URL, including the workspace ID. For example,adb-1234567890123456.12.azuredatabricks.net.
Note
The package versions shown above are current as of the last update to this page. Newer versions may be available. Verify that package versions are compatible with your Spark version.
Read Delta Lake tables with Apache Spark using PAT authentication
Important
Databricks recommends using OAuth instead of PATs for user account authentication because OAuth provides stronger security. To learn how to authenticate with a Databricks user account using OAuth, see Authorize user access to Azure Databricks with OAuth.
Configuration
The following packages or JARs must be included in your Spark configuration:
- Delta Lake Spark (
io.delta:delta-spark): Provides Delta Lake support for Apache Spark. - Unity Catalog Spark connector (
io.unitycatalog:unitycatalog-spark): Connects Apache Spark to Unity Catalog. - Cloud storage connector: Required to access the cloud object storage backing your tables. The connector depends on your cloud provider:
- AWS:
org.apache.hadoop:hadoop-aws— provides S3 filesystem support. - Azure:
org.apache.hadoop:hadoop-azure— provides Azure Data Lake Storage Gen2 support. - GCP:
gcs-connectorJAR — provides Google Cloud Storage support. Download the JAR separately and reference it usingspark.jars.
- AWS:
For additional cloud-specific configurations, see the Unity Catalog OSS documentation.
"spark.sql.extensions": "io.delta.sql.DeltaSparkSessionExtension",
"spark.sql.catalog.spark_catalog": "io.unitycatalog.spark.UCSingleCatalog",
"spark.sql.catalog.<uc-catalog-name>": "io.unitycatalog.spark.UCSingleCatalog",
"spark.sql.catalog.<uc-catalog-name>.uri": "<workspace-url>",
"spark.sql.catalog.<uc-catalog-name>.token": "<token>",
"spark.sql.defaultCatalog": "<uc-catalog-name>",
"spark.jars.packages": "io.delta:delta-spark_2.13:4.0.1,io.unitycatalog:unitycatalog-spark_2.13:0.3.1,org.apache.hadoop:hadoop-azure:3.3.6"
:::
Variables
Substitute the following variables in the configuration:
<uc-catalog-name>: The name of the Unity Catalog catalog that contains your tables.<token>: Personal access token (PAT) for the principal configuring the integration.
<workspace-url>: The Azure Databricks workspace URL, including the workspace ID. For example,adb-1234567890123456.12.azuredatabricks.net.
Credential renewal
To enable automatic credential renewal for long-running jobs, add the following configuration:
"spark.sql.catalog.<catalog-name>.renewCredential.enabled": true
Note
The package versions shown above are current as of the last update to this page. Newer versions might be available. Verify that package versions are compatible with your Databricks Runtime and Spark versions.