Cannot read excel file which is in using adls using load_workbook of openpyxl in databricks

Alpha 20 Reputation points
2024-05-10T10:34:09.57+00:00

Cannot read excel file which is in using load_workbook of openpyxl but can read if copied to dbfs

User's image

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,380 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,511 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,985 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,805 questions
Excel Management
Excel Management
Excel: A family of Microsoft spreadsheet software with tools for analyzing, charting, and communicating data.Management: The act or process of organizing, handling, directing or controlling something.
1,662 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Suba Balaji 11,191 Reputation points
    2024-05-12T10:45:15.2633333+00:00

    Hello,

    The reason behind this issue is

    The open method (load_workbook) works only with local files - it doesn't know anything about abfss or other cloud storages.Please check this stack overflow link for similar question.

    Workaround:

    1. You can use https + sas key (generated from azure portal)
    2. Mounting the file to dbfs or do a copy as you have mentioned

    use dbutils.fs.cp to copy file from ADLS to local disk of driver node, and then work with it, like: dbutils.fs.cp("abfss:/....", "file:/tmp/my-copy")

    So whatever you are doing is the correct way of accessing the file.

    Hope it helps.

    0 comments No comments