multiple excel workbooks with multiple sheets with same strcture - merged and convert to CSV files based on the number of sheets in workbook.

Sitaram Lanka 0 Reputation points
2023-11-23T22:03:02.6166667+00:00
I need to process multiple Excel workbooks with multiple sheets; all workbooks contain the same number of sheets with the same structure, i.e , sheet1 structure is the same in all workbooks, and sheet2, sheet2, and so on. 

I want convert them to CSV and to merge all sheet 1s from all workbooks and all sheet 2s from all workbooks and so on.. So the final output CSV files are equal to the number of sheets in workbooks. 


Example: 
workbook1 name - India (sheet1 - batters, sheet2 - bowlers, sheet3 - coaching staff) 
workbook2 name - Australia  (sheet1 - batters, sheet2 - bowlers, sheet3 - coaching staff) 
.
.
workbook5 name - New Zealand (sheet1 - batters, sheet2 - bowlers, sheet3 - coaching staff) 

Output:
   Batters.csv -> Workbook1.sheet1 + Workbook2.sheet1 + .... + Workbook5.sheet1 
   Bowlers.csv -> Workbook1.sheet2 + Workbook2.sheet2 + .... + Workbook5.sheet2  
   Coaching Staff.csv -> Workbook1.sheet3 + Workbook1.sheet3 + .... + Workbook5.sheet3  
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,077 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Nandan Hegde 30,951 Reputation points MVP
    2023-11-24T03:21:28.02+00:00

    Hey,

    As you know the number of sheets within the excel would remain constant, please proceed with the below flow:

    1. Create a variable with the value as max number of sheets, so lets say in our case would be 5 or an array variable with values [0,1,2,3,4] matching the sheet indexes
    2. use getmeta data activity to get the number of excel files within the folder you need to process
    3. Use for each iteration to process those many files
      1. within foreach activity, call execute pipeline activity
          1. The another pipeline would have a foreach activity with the number of iterations as the sheet numbers within an excel. So the parameters passed from the parent pipeline to this pipeline would be the file path and the number of sheets within it.
        

    User's image

    You can parameterize the sheetindex in excel dataset

    1. Once all the excel sheets are converted in CSV in respective folders, use for each activity again in the main pipeline after the 1st one and iterate it over the number of sheets variable as now the task is to merge the CSV files into a single file.

    to merge follow similar approach :

    https://www.c-sharpcorner.com/article/merge-multiple-json-files-via-synapse-data-factory-pipelines/