Issue reading delta in storage account gen-2

Ferreira, Jeniffer (NAZ-V) 0 Reputation points
2023-03-22T19:37:19.9766667+00:00

I was reading a delta table in storage account gen 2 on azure Databricks with spark and today the command didn't work and I'm receiving timeout error, it just continue running the command:
spark.read.format('delta').load(my table)

Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
2,687 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,917 questions
{count} votes

2 answers

Sort by: Most helpful
  1. BhargavaGunnam-MSFT 26,136 Reputation points Microsoft Employee
    2023-03-28T15:23:06.1533333+00:00

    Hello Ferreira, Jeniffer (NAZ-V),

    Welcome to the MS Q&A platform.

    Below are the potential root causes for your issue.

    • Network connectivity issues: This could be caused by network connectivity issues between your Databricks cluster and the storage account. Check if any network issues could be causing the timeout error.
    • Large Delta table size: If the Delta table size is large, it may take a long time to load the data, causing the timeout error. In this case, you can try increasing the timeout setting for the Spark read operation.
    • Insufficient resources: If your Databricks cluster does not have sufficient resources, it may not be able to load the Delta table within the timeout period. You can try increasing the size of your cluster or using a more powerful instance type.
    • Permissions: If the credentials used to access the storage account do not have sufficient permissions, you may not be able to access the Delta table. Check the permissions for the storage account and ensure that the credentials used to access the storage account have sufficient permissions.
    • The version of the Delta Lake library you are using may be incompatible with the version of Spark you are running, causing the command to fail.
    • Try running the command again after a few minutes to see if the issue resolves itself.

    If the issue still persists, you can try increasing the timeout value in your code to see if that resolves the issue. You can try setting the spark.sql.execution.arrow.maxRecordsPerBatch configuration property to a higher value to increase the timeout value.

    I hope this helps. Please let us know if you have any further questions.

    1 person found this answer helpful.
    0 comments No comments

  2. Deleted

    This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.


    Comments have been turned off. Learn more