Databricks Connect 的計算組態

2025-05-10

注意

本文涵蓋 Databricks Runtime 13.3 LTS 及以上版本的 Databricks Connect。

在本文中，您會設定屬性來建立 Databricks Connect 與 Azure Databricks 叢集或無伺服器計算之間的連線。此資訊適用於 Databricks Connect 的 Python 和 Scala 版本，除非另有說明。

Databricks Connect 可讓您將熱門的 IDE 連線到 Azure Databricks 叢集，例如 Visual Studio Code、PyCharm、RStudio Desktop、IntelliJ IDEA、Notebook 伺服器和其他自定義應用程式。請參閱什麼是 Databricks Connect？。

需求

若要設定 Databricks 計算的連線，您必須具備：

已安裝 Databricks Connect。如需 Databricks Connect 特定語言版本的安裝需求和步驟，請參閱：
已在 Azure Databricks 帳戶和工作區中啟用 Unity 目錄。請參閱開始使用 Unity 目錄和啟用 Unity 目錄的工作區。
計算的 Databricks Runtime 版本必須至少等於或高於 Databricks Connect 套件的版本。 Databricks 建議您使用與 Databricks Runtime 版本相符的最新 Databricks Connect 套件。如需計算版本需求，請參閱適用於 Python 的 Databricks Connect 版本支援矩陣或適用於 Scala 的 Databricks Connect。

若要使用更新版本的 Databricks Runtime 中可用的功能，您必須升級 Databricks Connect 套件。如需可用的 Databricks Connect 版本清單，請參閱 Databricks Connect 版本資訊。如需 Databricks Runtime 版本版本資訊，請參閱 Databricks Runtime 版本資訊和相容性。
如果您使用傳統計算，叢集必須使用指派或共用的叢集存取模式。請參閱存取模式。

設定

在開始之前，您需要下列項目：

如果您要連線到叢集，請輸入叢集的識別碼。您可以從 URL 擷取叢集標識碼。請參閱叢集 URL 和標識碼。
Azure Databricks 工作區實例名稱。這是您的計算環境中的伺服器 主機名 值。如需取得 Azure Databricks 計算資源的連線詳細資訊，請參閱。
您想要使用的 Databricks 驗證類型所需的任何其他屬性。

注意

適用於 Python 0.19.0 和更新版本之 Databricks SDK 支援 OAuth 使用者對電腦（U2M）驗證。將您的程式碼專案中的適用於 Python 的 Databricks SDK 已安裝版本更新為 0.19.0 或更高版本，以使用 OAuth U2M 驗證。請參閱開始使用適用於 Python 的 Databricks SDK。

針對 OAuth U2M 驗證，您必須先使用 Databricks CLI 進行驗證，才能執行 Python 程式代碼。請參閱教學課程。
Databricks SDK for Python 0.18.0 和以上版本支援 OAuth 機器對機器（M2M）驗證。將適用於 Python 的 Databricks SDK 已安裝版本更新為 0.18.0 或更新版本，以使用 OAuth M2M 驗證。請參閱開始使用適用於 Python 的 Databricks SDK。
適用於 Python 的 Databricks SDK 尚未實作 Azure 受控識別驗證。

設定叢集的連線

有多種方式可設定叢集的連線。 Databricks Connect 會依下列順序搜尋組態屬性，並使用它找到的第一個組態。如需進階組態資訊，請參閱適用於 Python 的 Databricks Connect 進階使用方式。

DatabricksSession 類別的 remote（）方法。
Databricks 組態配置檔
DATABRICKS_CONFIG_PROFILE環境變數
每個組態屬性的環境變數
名為 DEFAULT 的 Databricks 組態配置檔

類別 `DatabricksSession` 的 `remote()` 方法

針對此選項，這只適用於 Azure Databricks 個人存取令牌驗證。您需要輸入工作區實例名稱、Azure Databricks 個人存取令牌，以及叢集的識別碼。

您可以透過數種方式初始化類別 DatabricksSession ：

在 host中設定 token、cluster_id和 DatabricksSession.builder.remote() 字段。
使用 Databricks SDK 的 Config 類別。
指定 Databricks 設定檔連同 cluster_id 欄位。

Databricks 建議透過環境變數或組態檔設定屬性，而不是在您的程式代碼中指定這些連接屬性，如本節所述。下列程式代碼範例假設您提供建議 retrieve_* 函式的一些實作，以從使用者或從某些其他組態存放區取得必要的屬性，例如 Azure KeyVault。

下列每個方法的程式代碼如下：

Python（程式語言）

# Set the host, token, and cluster_id fields in DatabricksSession.builder.remote.
# If you have already set the DATABRICKS_CLUSTER_ID environment variable with the
# cluster's ID, you do not also need to set the cluster_id field here.
from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.remote(
host       = f"https://{retrieve_workspace_instance_name()}",
token      = retrieve_token(),
cluster_id = retrieve_cluster_id()
).getOrCreate()

程式語言 Scala

// Set the host, token, and clusterId fields in DatabricksSession.builder.
// If you have already set the DATABRICKS_CLUSTER_ID environment variable with the
// cluster's ID, you do not also need to set the clusterId field here.
import com.databricks.connect.DatabricksSession

val spark = DatabricksSession.builder()
    .host(retrieveWorkspaceInstanceName())
    .token(retrieveToken())
    .clusterId(retrieveClusterId())
    .getOrCreate()

Python（程式語言）

# Use the Databricks SDK's Config class.
# If you have already set the DATABRICKS_CLUSTER_ID environment variable with the
# cluster's ID, you do not also need to set the cluster_id field here.
from databricks.connect import DatabricksSession
from databricks.sdk.core import Config

config = Config(
host       = f"https://{retrieve_workspace_instance_name()}",
token      = retrieve_token(),
cluster_id = retrieve_cluster_id()
)

spark = DatabricksSession.builder.sdkConfig(config).getOrCreate()

程式語言 Scala

// Use the Databricks SDK's Config class.
// If you have already set the DATABRICKS_CLUSTER_ID environment variable with the
// cluster's ID, you do not also need to set the clusterId field here.
import com.databricks.connect.DatabricksSession
import com.databricks.sdk.core.DatabricksConfig

val config = new DatabricksConfig()
    .setHost(retrieveWorkspaceInstanceName())
    .setToken(retrieveToken())
val spark = DatabricksSession.builder()
    .sdkConfig(config)
    .clusterId(retrieveClusterId())
    .getOrCreate()

Python（程式語言）

# Specify a Databricks configuration profile along with the `cluster_id` field.
# If you have already set the DATABRICKS_CLUSTER_ID environment variable with the
# cluster's ID, you do not also need to set the cluster_id field here.
from databricks.connect import DatabricksSession
from databricks.sdk.core import Config

config = Config(
profile    = "<profile-name>",
cluster_id = retrieve_cluster_id()
)

spark = DatabricksSession.builder.sdkConfig(config).getOrCreate()

程式語言 Scala

// Specify a Databricks configuration profile along with the clusterId field.
// If you have already set the DATABRICKS_CLUSTER_ID environment variable with the
// cluster's ID, you do not also need to set the clusterId field here.
import com.databricks.connect.DatabricksSession
import com.databricks.sdk.core.DatabricksConfig

val config = new DatabricksConfig()
    .setProfile("<profile-name>")
val spark = DatabricksSession.builder()
    .sdkConfig(config)
    .clusterId(retrieveClusterId())
    .getOrCreate()

Databricks 組態配置檔

針對此選項，請建立或確認包含 Azure Databricks 配置設定檔中您想要使用的 Databricks 驗證類型所需的字段及其他必要欄位的 cluster_id。

每個驗證類型的必要組態設定檔欄位如下所示：

針對 Azure Databricks 個人存取權杖驗證： host 和 token。
針對 OAuth 機器對機器（M2M）驗證（支援的位置）：host、client_id和 client_secret。
針對 OAuth 使用者對機器 (U2M) 驗證（如果支援）：host。
針對Microsoft Entra ID（先前稱為 Azure Active Directory）服務主體驗證：host、azure_tenant_id、azure_client_id、azure_client_secret，以及可能azure_workspace_resource_id。
針對 Azure CLI 驗證： host。
針對 Azure 受控識別驗證（其中支援）：host、azure_use_msi、azure_client_id，以及可能 azure_workspace_resource_id。

然後透過組態類別設定此組態配置檔的名稱。

您可以透過幾種方式指定 cluster_id ：

將 cluster_id 字段加入您的組態配置檔中，然後僅指定組態配置檔的名稱。
指定組態配置檔名稱以及 cluster_id 欄位。

如果您已經使用叢集識別元來設定 DATABRICKS_CLUSTER_ID 環境變數，則不需要指定 cluster_id。

下列每個方法的程式代碼如下：

Python（程式語言）

# Include the cluster_id field in your configuration profile, and then
# just specify the configuration profile's name:
from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.profile("<profile-name>").getOrCreate()

程式語言 Scala

// Include the cluster_id field in your configuration profile, and then
// just specify the configuration profile's name:
import com.databricks.connect.DatabricksSession
import com.databricks.sdk.core.DatabricksConfig

val config = new DatabricksConfig()
    .setProfile("<profile-name>")
    val spark = DatabricksSession.builder()
    .sdkConfig(config)
    .getOrCreate()

Python（程式語言）

# Specify the configuration profile name along with the cluster_id field.
# In this example, retrieve_cluster_id() assumes some custom implementation that
# you provide to get the cluster ID from the user or from some other
# configuration store:
from databricks.connect import DatabricksSession
from databricks.sdk.core import Config

config = Config(
profile    = "<profile-name>",
cluster_id = retrieve_cluster_id()
)

spark = DatabricksSession.builder.sdkConfig(config).getOrCreate()

程式語言 Scala

// Specify a Databricks configuration profile along with the clusterId field.
// If you have already set the DATABRICKS_CLUSTER_ID environment variable with the
// cluster's ID, you do not also need to set the clusterId field here.
import com.databricks.connect.DatabricksSession
import com.databricks.sdk.core.DatabricksConfig

val config = new DatabricksConfig()
    .setProfile("<profile-name>")
val spark = DatabricksSession.builder()
    .sdkConfig(config)
    .clusterId(retrieveClusterId())
    .getOrCreate()

`DATABRICKS_CONFIG_PROFILE`環境變數

針對此選項，請建立或確認包含 Azure Databricks 配置設定檔中您想要使用的 Databricks 驗證類型所需的字段及其他必要欄位的 cluster_id。

如果您已經使用叢集識別元來設定 DATABRICKS_CLUSTER_ID 環境變數，則不需要指定 cluster_id。

每個驗證類型的必要組態設定檔欄位如下所示：

針對 Azure Databricks 個人存取權杖驗證： host 和 token。
針對 OAuth 機器對機器（M2M）驗證（支援的位置）：host、client_id和 client_secret。
針對 OAuth 使用者對機器 (U2M) 驗證（如果支援）：host。
針對Microsoft Entra ID（先前稱為 Azure Active Directory）服務主體驗證：host、azure_tenant_id、azure_client_id、azure_client_secret，以及可能azure_workspace_resource_id。
針對 Azure CLI 驗證： host。
針對 Azure 受控識別驗證（其中支援）：host、azure_use_msi、azure_client_id，以及可能 azure_workspace_resource_id。

將 DATABRICKS_CONFIG_PROFILE 環境變數設定為此組態配置檔的名稱。然後初始化 DatabricksSession 類別：

Python（程式語言）

from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.getOrCreate()

程式語言 Scala

import com.databricks.connect.DatabricksSession

val spark = DatabricksSession.builder().getOrCreate()

每個組態屬性的環境變數

針對此選項，請設定 DATABRICKS_CLUSTER_ID 環境變數，以及您想要使用的 Databricks 驗證類型所需的任何其他環境變數。

每個驗證類型的必要環境變數如下：

針對 Azure Databricks 個人存取權杖驗證： DATABRICKS_HOST 和 DATABRICKS_TOKEN。
針對 OAuth 機器對機器（M2M）驗證（支援的位置）：DATABRICKS_HOST、DATABRICKS_CLIENT_ID和 DATABRICKS_CLIENT_SECRET。
針對 OAuth 使用者對機器 (U2M) 驗證（如果支援）：DATABRICKS_HOST。
針對Microsoft Entra ID（先前稱為 Azure Active Directory）服務主體驗證：DATABRICKS_HOST、ARM_TENANT_ID、ARM_CLIENT_ID、ARM_CLIENT_SECRET，以及可能DATABRICKS_AZURE_RESOURCE_ID。
針對 Azure CLI 驗證： DATABRICKS_HOST。
針對 Azure 受控識別驗證（其中支援）：DATABRICKS_HOST、ARM_USE_MSI、ARM_CLIENT_ID，以及可能 DATABRICKS_AZURE_RESOURCE_ID。

然後初始化 DatabricksSession 類別：

Python（程式語言）

from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.getOrCreate()

程式語言 Scala

import com.databricks.connect.DatabricksSession

val spark = DatabricksSession.builder().getOrCreate()

名為 `DEFAULT` 的 Databricks 的組態配置檔

針對此選項，請建立或確認包含 Azure Databricks 配置設定檔中您想要使用的 Databricks 驗證類型所需的字段及其他必要欄位的 cluster_id。

如果您已經使用叢集識別元來設定 DATABRICKS_CLUSTER_ID 環境變數，則不需要指定 cluster_id。

每個驗證類型的必要組態設定檔欄位如下所示：

針對 Azure Databricks 個人存取權杖驗證： host 和 token。
針對 OAuth 機器對機器（M2M）驗證（支援的位置）：host、client_id和 client_secret。
針對 OAuth 使用者對機器 (U2M) 驗證（如果支援）：host。
針對Microsoft Entra ID（先前稱為 Azure Active Directory）服務主體驗證：host、azure_tenant_id、azure_client_id、azure_client_secret，以及可能azure_workspace_resource_id。
針對 Azure CLI 驗證： host。
針對 Azure 受控識別驗證（其中支援）：host、azure_use_msi、azure_client_id，以及可能 azure_workspace_resource_id。

將此組態設定檔 DEFAULT命名為。

然後初始化 DatabricksSession 類別：

Python（程式語言）

from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.getOrCreate()

程式語言 Scala

import com.databricks.connect.DatabricksSession

val spark = DatabricksSession.builder().getOrCreate()

設定與無伺服器計算的連線

重要

這項功能處於公開預覽狀態。

Databricks Connect for Python 支援連線到無伺服器計算。若要使用這項功能，必須符合連線到無伺服器的版本需求。請參閱需求。

重要

這項功能有下列限制：

此功能僅支援 Databricks Connect for Python。
Python 和 Databricks Connect 版本必須相容。請參閱版本支援矩陣。
適用於 Python 的所有 Databricks Connect 限制
所有無伺服器計算限制
只有包含在無伺服器計算環境的 Python 相依性，才能用於 UDF。請參閱無伺服器環境版本。無法安裝其他相依性。
不支援具有自定義模組的UDF。

您可以透過下列其中一種方式設定與無伺服器計算的連線：

將本機環境變數 DATABRICKS_SERVERLESS_COMPUTE_ID 設定為 auto。如果設定此環境變數，Databricks Connect 會忽略 cluster_id。
在本機 Databricks 組態設定檔中，設定 serverless_compute_id = auto，然後從您的程式代碼參考該配置檔。
```
[DEFAULT]
host = https://my-workspace.cloud.databricks.com/
serverless_compute_id = auto
token = dapi123...
```
或使用下列其中一個選項：

from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.serverless(True).getOrCreate()

from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.remote(serverless=True).getOrCreate()

注意

無伺服器計算會話會在閑置 10 分鐘後逾時。在此之後，應該使用 getOrCreate() 來建立新的Spark作業階段，以連線到無伺服器計算。

驗證 Databricks 的連線

若要驗證您的環境、預設憑證以及計算資源的連線已正確設定針對 Databricks Connect，請執行 databricks-connect test 命令，當偵測到設定中的任何不相容之處，此命令將失敗並返回非零退出碼，並提供相應的錯誤訊息。

databricks-connect test

在 Databricks Connect 14.3 和更新版本中，您也可以使用 validateSession()來驗證您的環境：

DatabricksSession.builder.validateSession(True).getOrCreate()

停用 Databricks Connect

Databricks Connect 和基礎的 Spark Connect 服務可以在任何給定的叢集上停用。

若要停用 Databricks Connect 服務，請在叢集上設定下列 Spark 組態。

spark.databricks.service.server.enabled false

共用方式為

Databricks Connect 的計算組態

需求

設定

設定叢集的連線

類別 DatabricksSession 的 remote() 方法

Python（程式語言）

程式語言 Scala

Python（程式語言）

程式語言 Scala

Python（程式語言）

程式語言 Scala

Databricks 組態配置檔

Python（程式語言）

程式語言 Scala

Python（程式語言）

程式語言 Scala

DATABRICKS_CONFIG_PROFILE環境變數

Python（程式語言）

程式語言 Scala

每個組態屬性的環境變數

Python（程式語言）

程式語言 Scala

名為 DEFAULT 的 Databricks 的組態配置檔

Python（程式語言）

程式語言 Scala

設定與無伺服器計算的連線

驗證 Databricks 的連線

停用 Databricks Connect

意見反應

其他資源

類別 `DatabricksSession` 的 `remote()` 方法

`DATABRICKS_CONFIG_PROFILE`環境變數

名為 `DEFAULT` 的 Databricks 的組態配置檔