The following is true of my setup:
- The cluster has its spark config set to apply the data lake's endpoint and account key.
- I have pre-deployed system topics & queue (via IaC ARM template YAML deployments) which are successfully receiving events. The example here is named 'queue1'.
The following masked & anonymised PySpark code errors with
Error while reading file abfss:REDACTED_LOCAL_PART@datalake.dfs.core.windows.net/<folder_name>/2022/07/14/<file_name>.json
Invalid configuration value detected for fs.azure.account.key.
Caused by: Invalid configuration value detected for fs.azure.account.key
The schema variable is in preceding code but returns a valid struct-based schema.
#cloudFiles config
cloudFiles_cfg = {
"cloudFiles.subscriptionId": "61******-****-****-****-***********7",
"cloudFiles.tenantId": "14******--****-****-****-***********f",
"cloudFiles.clientId": "07******-****-****-****-***********e",
"cloudFiles.clientSecret": "***************************",
"cloudFiles.resourceGroup": "rg-datahub",
"cloudFiles.connectionString" : "BlobEndpoint=https://datalake.blob.core.windows.net/;QueueEndpoint=https://datalake.queue.core.windows.net/;FileEndpoint=https://datalake.file.core.windows.net/;TableEndpoint=https://datalake.table.core.windows.net/;SharedAccessSignature=sv=2021-06-08&ss=bfqt&srt=sco&sp=rwdlacupx&se=2032-07-14T20:01:35Z&st=2022-07-14T12:01:35Z&spr=https&sig=***************************************",
"cloudFiles.storageAccount": "datalake",
"cloudFiles.format": "json",
"cloudFiles.useNotifications": "true",
"cloudFiles.queueName": "queue1",
}
incoming = (spark.readStream
.format("cloudFiles")
.options(**cloudFiles_cfg)
.schema(schema)
.load()
)
display(incoming)
On executing the stream, the follow behaviour occurs
- The stream initialises successfully.
- If the queue is empty, the stream continues to poll happily, returning blank results.
- As soon as a message is added to the queue, the stream processes the message but fails returning the error.
- After the stream has failed, the message is nonetheless dequeued.
Looking for reasons why this error would occur & potential resolutions.