Databricks Utilities with Databricks Connect for Python
Note
This article covers Databricks Connect for Databricks Runtime 13.3 LTS and above.
This article describes how to use Databricks Utilities with Databricks Connect for Python. Databricks Connect enables you to connect popular IDEs, notebook servers, and custom applications to Azure Databricks clusters. See What is Databricks Connect?. For the Scala version of this article, see Databricks Utilities with Databricks Connect for Scala.
Note
Before you begin to use Databricks Connect, you must set up the Databricks Connect client.
You use Databricks Connect to access Databricks Utilities as follows:
- Use the
WorkspaceClient
class’sdbutils
variable to access Databricks Utilities. TheWorkspaceClient
class belongs to the Databricks SDK for Python and is included in Databricks Connect. - Use
dbutils.fs
to access the Databricks Utilities fs utility. - Use
dbutils.secrets
to access the Databricks Utilities secrets utility. - No Databricks Utilities functionality other than the preceding utilities are available through
dbutils
.
Tip
You can also use the included Databricks SDK for Python to access any available Databricks REST API, not just the preceding Databricks Utilities APIs. See databricks-sdk on PyPI.
To initialize WorkspaceClient
, you must provide enough information to authenticate an Databricks SDK with the workspace. For example, you can:
Hard-code the workspace URL and your access token directly within your code, and then initialize
WorkspaceClient
as follows. Although this option is supported, Databricks does not recommend this option, as it can expose sensitive information, such as access tokens, if your code is checked into version control or otherwise shared:from databricks.sdk import WorkspaceClient w = WorkspaceClient(host = f"https://{retrieve_workspace_instance_name()}", token = retrieve_token())
Create or specify a configuration profile that contains the fields
host
andtoken
, and then intialize theWorkspaceClient
as follows:from databricks.sdk import WorkspaceClient w = WorkspaceClient(profile = "<profile-name>")
Set the environment variables
DATABRICKS_HOST
andDATABRICKS_TOKEN
in the same way you set them for Databricks Connect, and then initializeWorkspaceClient
as follows:from databricks.sdk import WorkspaceClient w = WorkspaceClient()
The Databricks SDK for Python does not recognize the SPARK_REMOTE
environment variable for Databricks Connect.
For additional Azure Databricks authentication options for the Databricks SDK for Python, as well as how to initialize AccountClient
within the Databricks SDKs to access available Databricks REST APIs at the account level instead of at the workspace level, see databricks-sdk on PyPI.
The following example shows how to use the Databricks SDK for Python to automate Databricks Utilities. This example creates a file named zzz_hello.txt
in a Unity Catalog volume’s path within the workspace, reads the data from the file, and then deletes the file. This example assumes that the environment variables DATABRICKS_HOST
and DATABRICKS_TOKEN
have already been set:
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
file_path = "/Volumes/main/default/my-volume/zzz_hello.txt"
file_data = "Hello, Databricks!"
fs = w.dbutils.fs
fs.put(
file = file_path,
contents = file_data,
overwrite = True
)
print(fs.head(file_path))
fs.rm(file_path)
See also Interaction with dbutils in the Databricks SDK for Python documentation.