ADF - ETL Improve Performance

Santhi Dhanuskodi 265 Reputation points
2024-08-28T08:57:51.8133333+00:00

Hi,

I have implemented SCD type 4 for loading our source files into Azure SQL.

We have about 3 years of files(7000+ files), We need to load all these files into Azure SQL.

We need to preserve the history of data. Live/Latest data will be in Target table, Older/History data will be in Target_History table. If I have 100records on the first day, it will be in TArget table the first day.

If I receive 100 records in the second day(98 are old, 2 are updated), We will update Target table with 2 new records. And old 2 records which were in Target table in the first day will be moved to Target_History.

This is the logic being implemented.

So we load files from older date to newer date files.

But We are concerned about the amount of time it takes to load all these files, And everyone should wait untill all files are loaded. Is there a better design for this?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,524 questions
{count} votes

1 answer

Sort by: Most helpful
  1. AnnuKumari-MSFT 32,661 Reputation points Microsoft Employee
    2024-08-28T11:36:35.43+00:00

    Hi @Santhi Dhanuskodi ,

    Thankyou for using Microsoft Q&A platform and thanks for posting your query here.

    As per my understanding , it seems you want to optimize the pipeline and want to reduce the overall execution time of the ETL process to load the old data to history table and load the new data into target table.

    I would like to understand if you are performing incremental load for the new records or doing completely truncate and reload to the target table after the data is moved to historical table. In case not implemented, you could consider leveraging incremental load process to upsert the incremental records instead of performing full load.

    One possible way to significantly reduce the execution time for loading "old records which were in Target table in the first day will be moved to Target_History" is to rename the schema in the Azure SQL.

    For example, suppose the data is present in target table dbo.table1 . You can opt for altering the schema of the target table to say, stg.table1_20242808 . Now load the new data to dbo.table1.

    Other ways for optimization:

    • Parallel Processing:
      • Utilize the ForEach activity in ADF to process multiple files simultaneously.
      • Increase the degree of parallelism in copy activity.
    • Optimize SQL Operations:
      • Ensure that your Target and Target_History tables are properly indexed
    • High availability and scalability of IR:

    Hope it helps. Kindly accept the answer if you found it helpful. Please let me know if there is any further query. Thankyou


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.