@Rohit Dobbanaboina - Thanks for the question and using MS Q&A platform.
The error message indicates that there is an issue with enumerating the directory, and the operation LISTSTATUS failed with an exception java.net.UnknownHostException. The customer has also enabled Azure Data Lake Storage credential passthrough for user-level data access.
To resolve this issue, you can follow the below steps:
- Check if the storage account is accessible and the credentials are correct. You can verify this by trying to access the storage account using the Azure Storage Explorer or Azure Portal.
- Ensure that the storage account is mounted correctly. You can verify this by checking the mount point and the extra configurations used while mounting the storage account.
- Check if the user has the necessary permissions to access the storage account. You can verify this by checking the access control settings for the storage account.
- Ensure that the DNS resolution is working correctly. The error message indicates that there is an issue with resolving the hostname. You can verify this by checking the DNS settings for the Azure Databricks workspace.
- Check if the firewall settings for the storage account are configured correctly. You can verify this by checking the firewall settings for the storage account and ensuring that the IP address of the Azure Databricks workspace is added to the allowed list.
- Check if the storage account is in the same region as the Azure Databricks workspace. If not, you may face latency issues while accessing the storage account.
- If the issue persists, you can try using the unmount the existing connection and try to create a new mount and see if that works.
Here is an example of how to mount an Azure Data Lake Storage Gen1 resource or a folder inside it, use the following commands:
configs = {
"fs.adl.oauth2.access.token.provider.type": "CustomAccessTokenProvider",
"fs.adl.oauth2.access.token.custom.provider": spark.conf.get("spark.databricks.passthrough.adls.tokenProviderClassName")
}
# Optionally, you can add <directory-name> to the source URI of your mount point.
dbutils.fs.mount(
source = "adl://<storage-account-name>.azuredatalakestore.net/<directory-name>",
mount_point = "/mnt/<mount-name>",
extra_configs = configs)
As per the repro from our end, I was created a new ADLS Gen1 account named cheprademo
and create a Databricks cluster with DBR: 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12)
with Azure Data Lake Storage credential passthrough Enable credential passthrough for user-level data access.
Azure Data Lake Gen1 Storage account:
I'm able to run the code and get the files without any issue:
For more details, refer https://learn.microsoft.com/en-us/azure/databricks/data-governance/credential-passthrough/adls-passthrough#--mount-azure-data-lake-storage-to-dbfs-using-credential-passthrough
Hope this helps. Do let us know if you any further queries.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.