Allowlist libraries and init scripts on shared compute
Article
In Databricks Runtime 13.3 LTS and above, you can add libraries and init scripts to the allowlist in Unity Catalog. This allows users to leverage these artifacts on compute configured with shared access mode.
To open the dialog for adding items to the allowlist in Catalog Explorer, do the following:
In your Azure Databricks workspace, click Catalog.
Click to open the metastore details and permissions UI.
Select Allowed JARs/Init Scripts.
Click Add.
Important
This option only displays for sufficiently privileged users. If you cannot access the allowlist UI, contact your metastore admin for assistance in allowlisting libraries and init scripts.
Add an init script to the allowlist
Complete the following steps in the allowlist dialog to add an init script to the allowlist:
For Type, select Init Script.
For Source Type, select Volume or the object storage protocol.
Complete the following steps in the allowlist dialog to add Maven coordinates to the allowlist:
For Type, select Maven.
For Source Type, select Coordinates.
Enter coordinates in the following format: groudId:artifactId:version.
You can include all versions of a library by allowlisting the following format: groudId:artifactId.
You can include all artifacts in a group by allowlisting the following format: groupId.
How are permissions on paths enforced in the allowlist?
You can use the allowlist to grant access to JARs or init scripts stored in Unity Catalog volumes and object storage. If you add a path for a directory rather than a file, allowlist permissions propagate to contained files and directories.
Prefix matching is used for all artifacts stored in Unity Catalog volumes or object storage. To prevent prefix matching at a given directory level, include a trailing slash (/). For example, /Volumes/prod-libraries/ will not perform prefix matching for files prefixed with prod-libraries. Instead, all files and directories within /Volumes/prod-libraries/ are added to the allowlist.
You can define permissions at the following levels:
The base path for the volume or storage container.
A directory nested at any depth from the base path.
A single file.
Adding a path to the allowlist only means that the path can be used for either init scripts or JAR installation. Azure Databricks still checks for permissions to access data in the specified location.
The principal used must have READ VOLUME permissions on the specified volume. See SELECT.
In single user access mode, the identity of the assigned principal (a user or service principal) is used.
In shared access mode:
Libraries use the identity of the library installer.
Init scripts use the identity of the cluster owner.
Note
No-isolation shared access mode does not support volumes, but uses the same identity assignment as shared access mode.
Databricks recommends configuring all object storage privileges related to init scripts and libraries with read-only permissions. Users with write permissions on these locations can potentially modify code in library files or init scripts.
Databricks recommends using Microsoft Entra ID service principals to manage access to JARs or init scripts stored in Azure Data Lake Storage Gen2. Use the following linked documentation to complete this setup:
(Optional) Refactor init scripts using azcopy or the Azure CLI.
You can reference environmental variables set during cluster configuration within your init scripts to pass credentials stored as secrets for validation.
Note
Allowlist permissions for JARs and init scripts are managed separately. If you use the same location to store both types of objects, you must add the location to the allowlist for each.
Start here and learn how you can get the full power of Azure with your Java apps - use idiomatic libraries to connect and interact with your preferred cloud services, including Azure SQL and NoSQL databases, messaging and eventing systems, Redis cache, storage and directory services. As always, use tools and frameworks that you know and love – Spring, Tomcat, WildFly, JBoss, WebLogic, WebSphere, Maven, Gradle, IntelliJ, Eclipse, Jenkins, Terraform and more.