Azure Synapse Workspace - How to read an Excel file from Data Lake Gen2 using Pandas or PySpark?

Cristina Santana Souza 61 Reputation points
2020-09-12T16:06:02.257+00:00

Hi,

In Azure Synapse Workspace is it possible to read an Excel file from Data Lake Gen2 using Pandas/PySpark? If so, can you show an example, please?

Example:

import pandas as pd

file_path = '/dbfs/mnt/raw/2020/06/01/file.xlsx' or 'abfss://raw@dlsname.dfs.core.windows.net/2020/06/01/file.xlsx'
df = pd.read_excel(file_path)

Best regards,
Cristina

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,435 questions
0 comments No comments
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 78,911 Reputation points Microsoft Employee
    2020-09-14T06:20:05.933+00:00

    Hello @Cristina Santana Souza ,

    Welcome to Microsoft Q&A platform.

    The method pandas.read_excel does not support using wasbs or abfss scheme URL to access the file. For more details, please refer pandas.read_excel.
    So if you want to access the file with pandas, I suggest you create a sas token and use https scheme with sas token to access the file or download the file as stream then read it with pandas.

    Steps to read excel file from Azure Synapse notebooks:

    Step1: Create SAS token via Azure portal.

    Select your Azure Storage account => Under settings => Click on Shared access signature

    24374-image.png

    Step2: Read excel file from Azure Data Lake Storage gen2.

    ReadExcel=pd.read_excel('https://<account name>.dfs.core.windows.net/<file system>/<path>?<sas token>')  
    print(ReadExcel)  
    

    24317-image.png

    Hope this helps. Do let us know if you any further queries.

    ----------------------------------------------------------------------------------------

    Do click on "Accept Answer" and Upvote on the post that helps you, this can be beneficial to other community members.

    3 people found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Banoth Hussain 1 Reputation point
    2022-06-15T08:58:28.703+00:00

    would this work for Azure Databricks notebooks as well?

    I tried the same approach but when I try to write an excel file it basically says "No engine for filetype: 'xlsx?sv=xxxxxxxxxxxx'