question

GopinathRajee-8127 avatar image
0 Votes"
GopinathRajee-8127 asked MartinJaffer-MSFT edited

Azure Databricks - Access Azure Data Lake Storage Gen2 using OAuth 2.0 with an Azure service principal

All,

I tried setting the connection details at the cluster level based on the following link and it works. But it is indicating that I use the Secret directly. Am I missing something? How can I make this work without having to directly specify the secret.

The replace sections has the extra line that needs to be removed as well



<service-credential-key-name> with the name of the key containing the client secret.




spark.hadoop.fs.azure.account.auth.type.<storage-account-name>.dfs.core.windows.net OAuth
spark.hadoop.fs.azure.account.oauth.provider.type.<storage-account-name>.dfs.core.windows.net org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
spark.hadoop.fs.azure.account.oauth2.client.id.<storage-account-name>.dfs.core.windows.net <application-id>
spark.hadoop.fs.azure.account.oauth2.client.secret.<storage-account-name>.dfs.core.windows.net <service-credential>
spark.hadoop.fs.azure.account.oauth2.client.endpoint.<storage-account-name>.dfs.core.windows.net https://login.microsoftonline.com/<directory-id>/oauth2/token






azure-databricksazure-data-lake-storage
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

MartinJaffer-MSFT avatar image
0 Votes"
MartinJaffer-MSFT answered MartinJaffer-MSFT edited

Hello @GopinathRajee-8127,
Thanks for the question and using MS Q&A platform.

As I understand, the ask is to get credentials out of the cluster configuration and store them securely elsewhere. The configuration should point to the credentials, but not expose them. This is specific to using the RDD option where details are specified in the cluster configuration as opposed to in the notebook like all the other options.
Please look at how to retrieve spark configuration from a secret. You will first need to set up the secrets as referenced later in this post.

Do note that secrets in Spark configuration are in public preview and available in Databricks Runtime 6.4 Extended Support and above. Link
Please read the details, as there are still security concerns with this method. Namely, notebooks can get the secret because they can get configuration properties. Also this is not redacted.
As such, I highly recommend you use a different method to specify your connection instead of this RDD and cluster config.

 spark.<property-name> {
                 {secrets/<scope-name>/<secret-name>}}

There should be no space between the two { . Q&A is forcing some formatting.

Info on secrets in general:

Link to Secret Management in Databricks.

In Databricks the mechanism for doing this is called Secret Scopes. There are 2 options for where to store the secrets -- Key Vault backed secret scopes and Databricks backed secret scopes.
In both cases, the code to fetch the secret is the same. dbutils.secrets.get(scope = "myScopeName", key = "mySecretName")

Link to secret workflow.
Link to workflow specific to ADLS Gen2 and OAuth2.


Please do let me if you have any queries.

Thanks
Martin


  • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how

  • Want a reminder to come back and check responses? Here is how to subscribe to a notification

  • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators




5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.