What files can I reference in an init script?

The support for referencing other files in an init script depends on where the referenced files are stored. This article outlines this behavior and provides recommendations.

Databricks recommends managing all init scripts as cluster-scoped init scripts.

What identity is used to run init scripts?

In single user access mode, the identity of the assigned principal (a user or service principal) is used.

In shared access mode or no-isolation shared access mode, init scripts use the identity of the cluster owner.

Not all locations for storing init scripts are supported on all Databricks Runtime versions and access modes. See Where can init scripts be installed?.

Can I reference files in Unity Catalog volumes from init scripts?

You can reference libraries and init scripts stored in Unity Catalog volumes from init scripts stored in Unity Catalog volumes.

Important

Credentials required to access other files stored in Unity Catalog volumes are only made available within init scripts stored in Unity Catalog volumes. You cannot reference any files in Unity Catalog volumes from init scripts configured from other locations.

For clusters with shared access mode, only the configured init script needs to be added to the allowlist. Access to other files referenced in the init script is governed by Unity Catalog.

Can I reference workspace files from init scripts?

In Databricks Runtime 11.3 LTS and above, you can reference other workspace files such as libraries, configuration files, or shell scripts from init scripts stored with workspace files.

Can I reference files in cloud object storage from init scripts?

You can reference libraries and init scripts stored in cloud object storage from init scripts.

For clusters with shared access mode, only the configured init script needs to be added to the allowlist. Access to other files referenced in the init script is determined by access configured to cloud object storage.

Databricks recommends using Microsoft Entra ID service principals to manage access to libraries and init scripts stored in Azure Data Lake Storage Gen2. Use the following linked documentation to complete this setup:

  1. Create a service principal with read and list permissions on your desired blobs. See Access storage using a service principal & Microsoft Entra ID(Azure Active Directory).

  2. Save your credentials using secrets. See Secrets.

  3. Set the properties in the Spark configuration and environmental variables while creating a cluster, as in the following example:

    Spark config:

    spark.hadoop.fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net OAuth
    spark.hadoop.fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
    spark.hadoop.fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net <application-id>
    spark.hadoop.fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net {{secrets/<secret-scope>/<service-credential-key>}}
    spark.hadoop.fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net https://login.microsoftonline.com/<tenant-id>/oauth2/token
    

    Environmental variables:

    SERVICE_CREDENTIAL={{secrets/<secret-scope>/<service-credential-key>}}
    
  4. (Optional) Refactor init scripts using azcopy or the Azure CLI.

    You can reference environmental variables set during cluster configuration within your init scripts to pass credentials stored as secrets for validation.