Invalid configuration value detected for fs.azure.account.key using Azure Databricks autoloader

Question

Invalid configuration value detected for fs.azure.account.key using Azure Databricks autoloader

Anonymous

The following is true of my setup:

The cluster has its spark config set to apply the data lake's endpoint and account key.
I have pre-deployed system topics & queue (via IaC ARM template YAML deployments) which are successfully receiving events. The example here is named 'queue1'.

The following masked & anonymised PySpark code errors with

Error while reading file abfss:******@datalake.dfs.core.windows.net/<folder_name>/2022/07/14/<file_name>.json
Invalid configuration value detected for fs.azure.account.key.
Caused by: Invalid configuration value detected for fs.azure.account.key

The schema variable is in preceding code but returns a valid struct-based schema.

#cloudFiles config  
cloudFiles_cfg = {  
  "cloudFiles.subscriptionId": "61******-****-****-****-***********7",  
  "cloudFiles.tenantId": "14******--****-****-****-***********f",  
  "cloudFiles.clientId": "07******-****-****-****-***********e",  
  "cloudFiles.clientSecret": "***************************",  
  "cloudFiles.resourceGroup": "rg-datahub",  
  "cloudFiles.connectionString" : "BlobEndpoint=https://datalake.blob.core.windows.net/;QueueEndpoint=https://datalake.queue.core.windows.net/;FileEndpoint=https://datalake.file.core.windows.net/;TableEndpoint=https://datalake.table.core.windows.net/;SharedAccessSignature=sv=2021-06-08&ss=bfqt&srt=sco&sp=rwdlacupx&se=2032-07-14T20:01:35Z&st=2022-07-14T12:01:35Z&spr=https&sig=***************************************",  
  "cloudFiles.storageAccount": "datalake",  
  "cloudFiles.format": "json",  
  "cloudFiles.useNotifications": "true",  
  "cloudFiles.queueName": "queue1",  
}  
  
incoming = (spark.readStream  
              .format("cloudFiles")   
              .options(**cloudFiles_cfg)   
              .schema(schema)  
              .load()   
           )  
display(incoming)

On executing the stream, the follow behaviour occurs

The stream initialises successfully.
If the queue is empty, the stream continues to poll happily, returning blank results.
As soon as a message is added to the queue, the stream processes the message but fails returning the error.
After the stream has failed, the message is nonetheless dequeued.

Looking for reasons why this error would occur & potential resolutions.

1 answer

Your answer

Answer 1

OK, a few minutes later I find the answer (keep doing this).

Turns out account key is not sufficient for the abfss protocol, so I've added the following configs:

spark.conf.set(
"fs.azure.account.auth.type.<storage-account-name>.dfs.core.windows.net",
"OAuth")
spark.conf.set(
"fs.azure.account.oauth.provider.type.<storage-account-name>.dfs.core.windows.net",
"org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set(
"fs.azure.account.oauth2.client.id.<storage-account-name>.dfs.core.windows.net",
"<application-id>")
spark.conf.set(
"fs.azure.account.oauth2.client.secret.<storage-account-name>.dfs.core.windows.net",
dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"))
spark.conf.set(
"fs.azure.account.oauth2.client.endpoint.<storage-account-name>.dfs.core.windows.net",
"https://login.microsoftonline.com/<directory-id>/oauth2/token")

This returns data now.

KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2022-07-15T17:31:00.267+00:00

Hello @Anonymous ,

Glad to know you were able to find the root cause. Appreciate much for sharing your findings here as it would help others reading this thread.

Share via

Invalid configuration value detected for fs.azure.account.key using Azure Databricks autoloader

1 answer

Your answer