ADF - Dynamically load all excel files from blob storage to Azure SQL DB

Santhi Dhanuskodi 365 Reputation points
2023-10-25T17:14:45+00:00

Hi,

 

I have some 10+ excel files in azure storage container. I want to load these files dynamically. For each of these files , the sheetname is going to be different. I can maintain sheetname details in a separate azure sql table 'ExcelDetails'.

 

How do I read all files dynamically and match with table data to fetch sheetname, and load the data into sql.

Also I want to control the pipeline runs, maintain activity runs in a table 'etlControlTable' and run pipelines activities only for 'non processed files'.

 

I am not sure how to go about this.  Waht all activities and the flow in pipeline?

Please guide me.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
{count} votes

1 answer

Sort by: Most helpful
  1. Subashri Vasudevan 11,306 Reputation points Volunteer Moderator
    2023-10-27T11:55:35.2166667+00:00

    Hi Santhi Dhanuskodi,

    You could start building pipeline with a look up table in Az. SQL where you have the below structure (or similar to). This table will have the mapping of filename and sheetname to be loaded to tblname

    filename,sheetname,tblname

    Use a getmetadata activity to dynamically get the file names - note that i have used wildcard in the filename portion of the dataset to get only .xlsx files.

    Screenshot 2023-10-27 at 4.55.30 PM

    if your source folder has subfolders, you might have to use a filter next to getmetadata activity and use the below expression to filter out folders.

    Items: @activity('Get Metadata1').output.childItems
    Condition: @equals(item().type,'File')
    

    Next, use a foreach loop poiting to getmetadata activity child items or filter output value (based on if you use a filter)

    Inside foreach, have a lookup activity and query the sql table based on the filename being iterated in foreach. Lookup query will look like below,

    select * from lookup where filename='@{item().name}'

    This will be giving you the sheetname and sink table name corresponding to the file being processed. Output of lookup will be like below

    "count": 1,
    	"value": [
    		{
    			"filename": "B.xlsx",
    			"sheetname": "ADF",
    			"tblname": "table_b_adf"
    		}
    
    

    Lastly, have a copy activity, source will be poiting to a excel dataset as follows.

    Source dataset

    Screenshot 2023-10-27 at 5.03.01 PM

    Here, we have parameterized filename and sheetname as indicated by arrows.

    Point to this data set - Excel1 from source tab of copy activty as follows.

    Source tab in copy activity

    Screenshot 2023-10-27 at 5.07.02 PM

    Here, sheetname comes from the lookup output and the filename from the foreach item()

    sheetname expression:

    @activity('Lookup1_copy1').output.value[0]['sheetname']
    

    Sink settings (mine is synapse dedicated, but would be similar for az sql as well except the copy method) - make sure to click on autocreate table option.

    Screenshot 2023-10-27 at 5.21.50 PM

    Expression used for table name:

    @activity('Lookup1_copy1').output.value[0]['tblname']
    

    Hope this helps for you to get set. Please let us know for queries while trying out.

    Thanks

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.