Addressing Performance Discrepancies in Azure Data Factory Pipelines

Smaran Thoomu 19,955 Reputation points Microsoft Vendor
2024-07-31T09:31:36.01+00:00

We are experiencing performance discrepancies between our development and production environments in Azure Data Factory (ADF) pipelines. The development pipeline execution time is between 45 to 50 minutes, while production takes about 1 hour and 5 minutes. Both environments are extracting data from SAP to Azure Data Storage using ADF, but the production environment has concurrent jobs set to 55, whereas the development environment is configured for 36. Could this be the cause, and how can we resolve this issue?

PS - Based on common issues that we have seen from customers and other sources, we are posting these questions to help the Azure community.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,219 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Smaran Thoomu 19,955 Reputation points Microsoft Vendor
    2024-07-31T10:39:18.4233333+00:00

    Greetings!

    The performance discrepancy between the development and production environments in Azure Data Factory pipelines is likely due to the difference in concurrent job settings. The production environment is configured to handle 55 concurrent jobs, while the development environment is set to 36. This variation can significantly impact performance, especially when dealing with large data extractions from SAP to Azure Data Storage.

    To address this issue, align the concurrent job settings between both environments. Adjust the production environment to match the development environment's configuration or vice versa. This alignment should help achieve consistent performance across both environments.

    Additionally, ensure that your Self-Hosted Integration Runtime (SHIR) is optimized and that multiple nodes are actively connecting to distribute the workload efficiently. This can help mitigate any memory issues caused by sequential processing.

    It's also important to continue monitoring the pipeline performance after making these adjustments and run further tests to ensure stability. If the issue persists, consider reviewing other environmental factors or configurations that might affect performance.
    Resource:

    Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

    Please do not forget to "up-vote" wherever the information provided helps you, as this can be beneficial to other community members.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.