ADF Copy activity takes days to complete the ingestion in to ADLS Gen2 from SAP HANA Onprem

CON-Thirusenthilkumar Pandiyan 45 Reputation points
2024-02-07T15:00:02.59+00:00

Hello, We are loading SAP On-Prem data into ADLS Gen2 using ADF Copy activity. we have done all the needed configuration properly to connect SAP Onprem source. we have 500 tables and all are works fine expect there are 5 tables which has more than 1billions records for each. those 5 tables takes more than 3 days of time to complete the full load at very first time. Is there anything will help to increase the performance. Thanks Thiru

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,271 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,048 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Harishga 1,675 Reputation points Microsoft Vendor
    2024-02-08T06:00:56.4166667+00:00

    Hi @CON-Thirusenthilkumar Pandiyan
    Welcome to Microsoft Q&A platform and thanks for posting your question here.

    Here are the reasons and solutions for slow performance when loading 5 tables with more than 1 billion records each from SAP On-Prem to ADLS Gen2 using ADF Copy activity.

    When loading 5 tables with more than 1 billion records each from SAP On-Prem to ADLS Gen2 using ADF Copy activity, slow performance can be caused by several reasons. These include high network latency, large data volume, complex data structure, insufficient hardware resources, and improper copy activity configuration.

    Here are the steps you can take to optimize the performance when transferring large amounts of data from SAP On-Prem to ADLS Gen2 using ADF Copy activity:

    • Use Azure ExpressRoute or VPN Gateway to establish a dedicated and private connection between the SAP On-Prem source and ADLS Gen2 to reduce network latency.
    • Filter out unnecessary data, reduce the number of columns, and use incremental loading to optimize data transfer.
    • Flatten the data, denormalize the data, and use simple data types to optimize the data structure.
    • Increase the memory, CPU, or disk space of the source system or destination system to optimize hardware resources.
    • Optimize the copy activity configuration by setting the "parallel Copies" property, using Polybasic, or leveraging the partition option.
    • Use Azure Monitor to monitor performance metrics and troubleshoot any issues.

    Reference:
    https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance-features
    https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance-troubleshooting#troubleshoot-copy-activity-on-self-hosted-ir

    I hope this information helps you. Let me know if you have any further questions or concerns.

    0 comments No comments