Hi Swarnava, I will tell you how I usually do it, “by hand”:
Assign UAMI to the cluster
Go to your Databricks workspace in Azure Portal → “Clusters” → choose the cluster → “Configuration” → “Advanced” → “Identity” tab → enable User Assigned Managed Identity and add dbmanagedidentity.
Verify on the cluster
Start it and check in the startup logs that the identity is mounted (you should see a message “Managed identity with client ID … attached”).
Prepare configs for ABFS
In the notebook define a dict like this (replace <ACCOUNT>, <CLIENT_ID> and <CONTAINER>):
configs = {
"fs.azure.account.auth.type.<ACCOUNT>.dfs.core.windows.net": "OAuth",
"fs.azure.account.oauth.provider.type.<ACCOUNT>.dfs.core.windows.net":
"org.apache.hadoop.fs.azurebfs.oauth2.ManagedIdentityTokenProvider",
"fs.azure.account.oauth2.msi.client.id.<ACCOUNT>.dfs.core.windows.net":
"<CLIENT_ID>"
}
Mount the filesystem
dbutils.fs.mount(
source = "abfss://<CONTAINER>@<ACCOUNT>.dfs.core.windows.net/",
mount_point = "/mnt/mydata",
extra_configs = configs
)
Use it
display(dbutils.fs.ls("/mnt/mydata"))
df = spark.read.parquet("/mnt/mydata/folder/file.parquet")
From there the dbmanagedidentity (with Storage Blob Data Contributor role) automatically authenticates calls to ADLS Gen2, without keys or secrets.