Training
Module
Automate workloads with Azure Databricks Jobs - Training
Automate workloads with Azure Databricks Jobs
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Cluster-scoped init scripts are init scripts defined in a cluster configuration. Cluster-scoped init scripts apply to both clusters you create and those created to run jobs.
You can configure cluster-scoped init scripts using the UI, the CLI, and by invoking the Clusters API. This section focuses on performing these tasks using the UI. For the other methods, see the Databricks CLI and the Clusters API.
You can add any number of scripts, and the scripts are executed sequentially in the order provided.
If a cluster-scoped init script returns a non-zero exit code, the cluster launch fails. You can troubleshoot cluster-scoped init scripts by configuring cluster log delivery and examining the init script log. See Init script logging.
This section contains instructions for configuring a cluster to run an init script using the Azure Databricks UI.
Databricks recommends managing all init scripts as cluster-scoped init scripts. If you are using compute with standard or dedicated access mode (formerly shared and single user access modes), store init scripts in Unity Catalog volumes. If you are using compute with no-isolation shared access mode, use workspace files for init scripts.
For standard access mode, you must add init scripts to the allowlist
. See Allowlist libraries and init scripts on compute with standard access mode (formerly shared access mode).
To use the UI to configure a cluster to run an init script, complete the following steps:
/Workspace/<path-to-script>/<script-name>.sh
/Volumes/<catalog>/<schema>/<volume>/<path-to-script>/<script-name>.sh
abfss://container-name@storage-account-name.dfs.core.windows.net/path/to/init-script
In dedicated access mode, the identity of the assigned principal (a user or service principal) is used.
In standard access mode, the identity of the cluster owner is used.
Note
No-isolation shared access mode does not support volumes, but uses the same identity assignment as standard access mode.
To remove a script from the cluster configuration, click the trash icon at the right of the script. When you confirm the delete you will be prompted to restart the cluster. Optionally you can delete the script file from the location you uploaded it to.
Note
If you configure an init script using the ABFSS source type, you must configure access credentials.
Databricks recommends using Microsoft Entra ID service principals to manage access to init scripts stored in Azure Data Lake Storage Gen2. Use the following linked documentation to complete this setup:
Create a service principal with read and list permissions on your desired blobs. See Access storage using a service principal & Microsoft Entra ID(Azure Active Directory).
Save your credentials using secrets. See Manage secrets.
Set the properties in the Spark configuration and environmental variables while creating a cluster, as in the following example:
Spark config:
spark.hadoop.fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net OAuth
spark.hadoop.fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
spark.hadoop.fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net <application-id>
spark.hadoop.fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net {{secrets/<secret-scope>/<service-credential-key>}}
spark.hadoop.fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net https://login.microsoftonline.com/<tenant-id>/oauth2/token
Environmental variables:
SERVICE_CREDENTIAL={{secrets/<secret-scope>/<service-credential-key>}}
(Optional) Refactor init scripts using azcopy or the Azure CLI.
You can reference environmental variables set during cluster configuration within your init scripts to pass credentials stored as secrets for validation.
Warning
Cluster-scoped init scripts on DBFS are end-of-life. The DBFS option in the UI exists in some workspaces to support legacy workloads and is not recommended. All init scripts stored in DBFS should be migrated. For migration instructions, see Migrate init scripts from DBFS.
Training
Module
Automate workloads with Azure Databricks Jobs - Training
Automate workloads with Azure Databricks Jobs
Documentation
What are init scripts? - Azure Databricks
Learn how to use initialization (init) scripts to install packages and libraries, set system properties and environment variables, modify Apache Spark config parameters, and set other configurations on Azure Databricks clusters.
What files can I reference in an init script? - Azure Databricks
Learn about support for referencing other files in init scripts stored in different locations on Azure Databricks.
Global init scripts - Azure Databricks
Learn how to use global init scripts to configure compute environments for all clusters in a workspace.