How can I load data from two different sheets in a single Excel file and split them into two separate tables in a data pipeline in Azure Data Factory?

Nima Pour 40 Reputation points
2024-09-13T08:14:31.7966667+00:00

I have an Excel file stored in Azure Blob Storage called Group Mapping Master.xlsx that contains two sheets: DK and SE. Each sheet contains data that needs to be loaded into two separate tables in a SQL Server database: financemappingDK and financemappingSE.

Currently, both sheets are defined as separate entries in the SourceProperties table, and the pipeline loads them simultaneously. However, since they are part of the same Excel file, this sometimes leads to concurrency issues, causing instability during the data load process.

Our goal is to maintain the flexibility of a generic Excel data pipeline, which dynamically processes multiple Excel files and tables, but also ensures that the Group Mapping Master.xlsx file is handled correctly, preventing the sheets from being loaded concurrently. At the same time, we need the pipeline to be adaptable in case these tables are deactivated or other tables/files are introduced in the future.

The Group Mapping Master process runs a bit "shaky," meaning the data doesn't always load correctly. This might be due to concurrency issues, as both the DK and SE sheets are in the same file and are being fetched twice simultaneously in the new pipeline.

Our configuration (SourceProperties) table in SQL DB:

Screenshot 2024-09-13 094023

Excel load pipeline in ADF: Screenshot 2024-09-13 094558

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
0 comments No comments
{count} votes

Accepted answer
  1. Vinodh247 34,661 Reputation points MVP Volunteer Moderator
    2024-09-13T12:55:19.8066667+00:00

    Hi Nima Pour,

    Thanks for reaching out to Microsoft Q&A.

    I would suggest you the following to handle concurrency issues of loading each sheet of a excel file while loading.

    1. Activity for DK Sheet:
      • In the pipeline, create a copy activity specifically for the DK sheet.
      • Define the source as the DK sheet from the Excel file.
      • Set the destination to the financemappingDK table in SQL Server.
    2. Activity for SE Sheet:
      • Similarly, create another copy activity for the SE sheet.
      • Define the source as the SE sheet from the Excel file.
      • Set the destination to the financemappingSE table in SQL Server.
    3. Sequential Execution:
      • Chain the two activities together using a dependency setup, such that the SE sheet is only loaded after the DK sheet is successfully loaded. This can be done by setting up dependencies between activities in the pipeline.

    For future proofing and flexibility, you can use a Lookup or If Condition activity to dynamically check whether the DK or SE sheets should be loaded (based on your SourceProperties table) or whether any sheets need to be skipped (ex: if the sheet is deactivated or doesn't exist). Ensure that retry policies and error handling are properly configured for both activities. This will help mitigate any temporary issues during the data load process and improve reliability.

    Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.