Databricks CLI setup & documentation
The Databricks command-line interface (CLI) provides an easy-to-use interface to the Azure Databricks platform. The open source project is hosted on GitHub. The CLI is built on top of the Databricks REST API and is organized into command groups based on primary endpoints.
You can use the Databricks CLI to do things such as:
- Provision compute resources in Azure Databricks workspaces.
- Run data processing and data analysis tasks.
- List, import, and export notebooks and folders in workspaces.
Important
This CLI is under active development and is released as an Experimental client. This means that interfaces are still subject to change.
Set up the CLI
This section lists CLI requirements and describes how to install and configure your environment to run the CLI.
Requirements
Python 3 - 3.6 and above
Python 2 - 2.7.9 and above
Important
On macOS, the default Python 2 installation does not implement the TLSv1_2 protocol and running the CLI with this Python installation results in the error:
AttributeError: 'module' object has no attribute 'PROTOCOL_TLSv1_2'
. Use Homebrew to install a version of Python that hasssl.PROTOCOL_TLSv1_2
.
Limitations
Using the Databricks CLI with firewall enabled storage containers is not supported. Databricks recommends you use Databricks Connect or az storage.
Install the CLI
Run pip install databricks-cli
using the appropriate version of pip
for your Python installation:
pip install databricks-cli
Update the CLI
Run pip install databricks-cli --upgrade
using the appropriate version of pip
for your Python installation:
pip install databricks-cli --upgrade
To list the version of the CLI that is currently installed, run databricks --version
(or databricks -v
):
databricks --version
# Or...
databricks -v
Set up authentication
Note
As a security best practice, when authenticating with automated tools, systems, scripts, and apps, Databricks recommends you use access tokens belonging to service principals instead of workspace users. To create access tokens for service principals, see Manage access tokens for a service principal.
Before you can run CLI commands, you must set up authentication. To authenticate to the CLI you can use a Databricks personal access token or an Azure Active Directory (Azure AD) token.
Note
By default, the following commands create a configuration profiles file named .databrickscfg
with a configuration profile named DEFAULT
in this new file. If the .databrickscfg
file already exists, that file’s DEFAULT
configuration profile is overwritten with the new data. To create a configuration profile with a different name instead, see Connection profiles.
Set up authentication using an Azure AD token
To configure the CLI using an Azure AD token, generate the Azure AD token and store it in the environment variable DATABRICKS_AAD_TOKEN
.
Run the following command:
databricks configure --aad-token
The command issues the prompt:
Databricks Host (should begin with https://):
Enter your per-workspace URL, with the format https://adb-<workspace-id>.<random-number>.azuredatabricks.net
. To get the per-workspace URL, see Per-workspace URL.
After you complete the prompt, your access credentials are stored in the file ~/.databrickscfg
on Unix, Linux, or macOS or %USERPROFILE%\.databrickscfg
on Windows. The file contains a default profile entry:
[DEFAULT]
host = <workspace-URL>
token = <Azure-AD-token>
Set up authentication using a Databricks personal access token
To configure the CLI to use a personal access token, run the following command:
databricks configure --token
The command begins by issuing the prompt:
Databricks Host (should begin with https://):
Enter your per-workspace URL, with the format https://adb-<workspace-id>.<random-number>.azuredatabricks.net
. To get the per-workspace URL, see Per-workspace URL.
The command continues by issuing the prompt to enter your personal access token:
Token:
After you complete the prompts, your access credentials are stored in the file ~/.databrickscfg
on Unix, Linux, or macOS, or %USERPROFILE%\.databrickscfg
on Windows. The file contains a default profile entry:
[DEFAULT]
host = <workspace-URL>
token = <personal-access-token>
For CLI 0.8.1 and above, you can change the path of this file by setting the environment variable DATABRICKS_CONFIG_FILE
.
Unix, linux, macos
export DATABRICKS_CONFIG_FILE=<path-to-file>
Windows
setx DATABRICKS_CONFIG_FILE "<path-to-file>" /M
Important
Beginning with CLI 0.17.2, the CLI does not work with a .netrc file. You can have a .netrc
file in your environment for other purposes, but the CLI will not use that .netrc
file.
CLI 0.8.0 and above supports the following Azure Databricks environment variables:
DATABRICKS_HOST
DATABRICKS_TOKEN
An environment variable setting takes precedence over the setting in the configuration file.
Test your authentication setup
To check whether you set up authentication correctly, you can run a command such as the following, replacing <someone@example.com>
with your Azure Databricks workspace username:
databricks workspace ls /Users/<someone@example.com>
If successful, this command lists the objects in the specified workspace path.
Connection profiles
The Databricks CLI configuration supports multiple connection profiles. The same installation of Databricks CLI can be used to make API calls on multiple Azure Databricks workspaces.
To add a connection profile, specify a unique name for the profile:
databricks configure [--token | --aad-token] --profile <profile-name>
The .databrickscfg
file contains a corresponding profile entry:
[<profile-name>]
host = <workspace-URL>
token = <token>
To use the connection profile:
databricks <group> <command> --profile <profile-name>
If --profile <profile-name>
is not specified, the default profile is used. If a default profile is not found, you are prompted to configure the CLI with a default profile.
Test your connection profiles
To check whether you set up your connection profiles correctly, you can run a command such as the following, replacing <someone@example.com>
with your Azure Databricks workspace username and <DEFAULT>
with one of your connection profile names:
databricks workspace ls /Users/<someone@example.com> --profile <DEFAULT>
If successful, this command lists the objects in the specified workspace path in the workspace for the specified connection profile. Run this command for each connection profile that you want to test.
Alias command groups
Sometimes it can be inconvenient to prefix each CLI invocation with the name of a command group, for example
databricks workspace ls
. To make the CLI easier to use, you can alias command groups to shorter commands.
For example, to shorten databricks workspace ls
to dw ls
in the
Bourne again shell, you can add alias dw="databricks workspace"
to the appropriate bash profile. Typically,
this file is located at ~/.bash_profile
.
Tip
Azure Databricks already aliases databricks fs
to dbfs
; databricks fs ls
and dbfs ls
are equivalent.
Use the CLI
This section shows you how to get CLI help, parse CLI output, and invoke commands in each command group.
Display CLI command group help
You list the subcommands for any command group by running databricks <group> --help
(or databricks <group> -h
). For example, you list the DBFS CLI subcommands by running databricks fs -h
:
databricks fs -h
Display CLI subcommand help
You list the help for a subcommand by running databricks <group> <subcommand> --help
(or databricks <group> <subcommand> -h
). For example, you list the help for the DBFS copy files subcommand by running databricks fs cp -h
:
databricks fs cp -h
Use jq
to parse CLI output
Some Databricks CLI commands output the JSON response from the API endpoint. Sometimes it can be
useful to parse out parts of the JSON to pipe into other commands. For example, to copy a job
definition, you must take the settings
field of a databricks jobs get
command and use that as an argument
to the databricks jobs create
command. In these cases, we recommend you to use the utility jq
.
For example, the following command prints the settings of the job with the ID of 233.
databricks jobs list --output JSON | jq '.jobs[] | select(.job_id == 233) | .settings'
{
"name": "Quickstart",
"new_cluster": {
"spark_version": "7.5.x-scala2.12",
"spark_env_vars": {
"PYSPARK_PYTHON": "/databricks/python3/bin/python3"
},
"num_workers": 8,
...
},
"email_notifications": {},
"timeout_seconds": 0,
"notebook_task": {
"notebook_path": "/Quickstart"
},
"max_concurrent_runs": 1
}
As another example, the following command prints the names and IDs of all available clusters in the workspace:
databricks clusters list --output JSON | jq '[ .clusters[] | { name: .cluster_name, id: .cluster_id } ]'
[
{
"name": "My Cluster 1",
"id": "1234-567890-grip123"
},
{
"name": "My Cluster 2",
"id": "2345-678901-patch234"
}
]
You can install jq
for example on macOS using Homebrew with brew install jq
or on Windows using Chocolatey with choco install jq
. For more information on jq
, see the jq Manual.
JSON string parameters
String parameters are handled differently depending on your operating system:
Unix, linux, macos
You must enclose JSON string parameters in single quotes. For example:
databricks jobs run-now --job-id 9 --jar-params '["20180505", "alantest"]'
Windows
You must enclose JSON string parameters in double quotes, and the quote characters inside the string must be preceded by \
. For example:
databricks jobs run-now --job-id 9 --jar-params "[\"20180505\", \"alantest\"]"
Troubleshooting
The following sections provide tips for troubleshooting common issues with the Databricks CLI.
Using EOF with databricks configure
does not work
For Databricks CLI 0.12.0 and above, using the end of file (EOF
) sequence in a script to pass parameters to the databricks configure
command does not work. For example, the following script causes Databricks CLI to ignore the parameters, and no error message is thrown:
# Do not do this.
databricksUrl=<per-workspace-url>
databricksToken=<personal-access-token-or-Azure-AD-token>
databricks configure --token << EOF
$databricksUrl
$databricksToken
EOF
To fix this issue, do one of the following:
- Use one of the other programmatic configuration options as described in Set up authentication.
- Manually add the
host
andtoken
values to the.databrickscfg
file as described in Set up authentication. - Downgrade your installation of the Databricks CLI to 0.11.0 or below, and run your script again.
CLI commands
Feedback
Submit and view feedback for