Synapse Workspace | pandas.read_excel is not working on a new cluster

Tamashevich, Tatsiana 85 Reputation points
2024-08-04T11:26:30.25+00:00

Hello,

I have a new spark pool with spark version of 3.4:
User's image

I was trying to read excel file via Pandas but got an error:

User's imageUser's image

Basically, from the same path I'm able to read CSV file but not able to read excel file what makes me think that it is connected with Pandas API.

Could you, please, give me a hint how to resolve the issue (apart from using clearytics package)?

Thank you in advance!

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,938 questions
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 89,816 Reputation points Microsoft Employee
    2024-08-07T15:22:36.4+00:00

    @Tamashevich, Tatsiana - I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to accept the answer .

    Ask: Synapse Workspace | pandas.read_excel is not working on a new cluster

    Solution: The issue is resolved. I found another nice solution: to mount the file path and that resolved the issue.

    User's image

    If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.

    If you have any other questions, please let me know. Thank you again for your time and patience throughout this issue.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. Tamashevich, Tatsiana 85 Reputation points
    2024-08-07T12:52:45.0833333+00:00

    Hi @PRADEEPCHEEKATLA-MSFT @Vinodh247 ,

    thank you for you support.

    I found another nice solution: to mount the file path and that resolved the issue.

    User's image

    1 person found this answer helpful.

  2. PRADEEPCHEEKATLA-MSFT 89,816 Reputation points Microsoft Employee
    2024-08-05T03:31:31.8866667+00:00

    @Tamashevich, Tatsiana Thanks for the question and using MS Q&A platform..

    The method pandas.read_excel does not support using wasbs or abfss scheme URL to access the file. For more details, please refer pandas.read_excel. So if you want to access the file with pandas, I suggest you create a sas token and use https scheme with sas token to access the file or download the file as stream then read it with pandas.

    Steps to read excel file from Azure Synapse notebooks:

    Step1: Create SAS token via Azure portal.

    Select your Azure Storage account => Under settings => Click on Shared access signature

    24374-image.png

    Step2: Read excel file from Azure Data Lake Storage gen2.

    ReadExcel=pd.read_excel('https://<account name>.dfs.core.windows.net/<file system>/<path>?<sas token>')  
    print(ReadExcel)  
    

    24317-image.png

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.