I want to retrieve Excel data with synapse notebook activity

NishimuraChinatsu-9854 756 Reputation points
2022-06-30T05:28:02.177+00:00

In synapse notebook activity, I want to use python to extract only sheets 2-4 in Excel and convert them to CSV files respectively.

I was able to extract only one specific sheet and convert it to a csv file, but I cannot output multiple sheets to a csv file respectively.

Since the Excel file is a macro-enabled file, you cannot use the macro-enabled file unless it is a notebook activity.
For your convenience, you cannot change a macro file to a regular file. So, I would like to know the python code that can convert multiple sheets of Excel into csv files in notebook activity.

I'm sorry that I have not studied enough and it is difficult to understand in the translated English.

Regards

216395-image.png

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,696 questions
0 comments No comments
{count} votes

Accepted answer
  1. MartinJaffer-MSFT 26,061 Reputation points
    2022-06-30T20:57:04.39+00:00

    Hello @NishimuraChinatsu-9854 ,
    Thanks for the question and using MS Q&A platform.

    As I understand, you want to convert .xlsm -> csv. You want to do this in Python in Synapse notebook. You only want to do some pages of excel.

    Python by itself does not read excel. So you need a library. Synapse comes with pandas library. We are interested in its read_excel. We also need DataFrame.to_csv.

    Code could look something like

    import pandas  
    path = "abfs[s]://file_system_name@account_name.dfs.core.windows.net/file_path/"  
    df = pandas.read_excel(path+"myFile.xlsm", [1,2] )  
    for frame in df:  
        frame.to_csv(path+frame.name)  
    

    If you do not want to use pandas, I did a search for a library which can do xlsm. pyexcel You will need to do installation to use this library.

    Specifically, alter code example for extracting a sheet from a book.

    Code could look something like

    from pyexcel.cookbook import extract_a_sheet_from_a_book  
    
    source_filepath = "//data//mybook.xlsm"  
    sheet_names_to_extract = ["Sheet 2", "special sheet number 3"]  
    
    for sheetname in sheet_names_to_extract:  
        extract_a_sheet_from_a_book(source_filepath, sheetname, "out_"+sheetname+".csv")  
    

    Code not tested.

    Please do let me if you have any queries.

    Thanks
    Martin


    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
      • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

0 additional answers

Sort by: Most helpful