Init Scripts with mounted azure data lake storage gen2

Sharukh Kundagol 145 Reputation points
2023-03-20T18:13:26.5766667+00:00

I'm trying to access init script which is stored on mounted azure data lake storage gen2 to dbfs

 

I mounted storage to

dbfs:/mnt/storage/container/script.sh

 

and when i try to access it

i got an error:

Cluster scoped init script dbfs:/mnt/storage/container/script.sh failed: Timed out with exception after 5 attempts (debugStr = 'Reading remote file for init script'), Caused by: java.io.FileNotFoundException: /WORKSPACE_ID/mnt/storage/container/script.sh: No such file or directory.

 

  1. I see this file in dbfs using magic "%sh" command in notebook
  2. I can read from this path using a spark.read...

 

 

in docs i found

https://docs.databricks.com/dbfs/unity-catalog.html#use-dbfs-while-launching-unity-catalog-clusters-with-single-user-access-mode

 

Databricks recommends using DBFS mounts for init scripts, configurations, and libraries stored in external storage. This behavior is not supported in shared access mode.

 

 

When i try to access this file using

abfss:// i got an error:

 

Failure to initialize configuration for storage account storage_name.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.key, Caused by: Invalid configuration value detected for fs.azure.account.key.)

 

but i used the same credentials like in "mount credentials" in previous way.

 

Does init scripts have any limitations with mounted dbfs?

 

I am concerned about the added workspace id in the error message at the beginning of the path 

 

I'm using the exactly the same path which i get using this command:

dbutils.fs.ls("/mnt/storage/container/script.sh")

 

I assume that when calling this command, the cluster is not yet running so I cannot travel ADLS. So i should use abfss:// instead

 

But how to authenticate with this storage, i tried this way

https://learn.microsoft.com/en-us/azure/databricks/storage/azure-storage#--access-azure-data-lake-storage-gen2-or-blob-storage-using-oauth-20-with-an-azure-service-principal

 

using service principal in spark config but it doesnt work.

 

can you please sugest wayforward

Azure Storage
Azure Storage
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
{count} votes

Answer accepted by question author
  1. PRADEEPCHEEKATLA 91,496 Reputation points Moderator
    2023-03-21T08:22:31.0833333+00:00


    Sharukh Kundagol
    - Thanks for the question and using MS Q&A platform.

    Azure Databricks scans the reserved location /databricks/init for legacy global init scripts. Databricks recommends you avoid storing init scripts in this location to avoid unexpected behavior.

    Recommended to copy the script files from ADLS gen2 storage account to DBFS folder such as /databricks/scripts User's image

    Here is the sample script which helps to create initscripts in DBFS folder:
    User's image

    For more details, refer to Cluster node initialization scripts.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.