Self hosted integration runtime - poor performance on copy activity from Oracle

Manon G 0 Reputation points
2023-08-22T14:43:41.6166667+00:00

Hello,

We have created a SHIR hosted on our reporting server and running pipelines to copy data from Oracle database to ADLS2.

Comparing to our previous tests with SQL Server, copy activity takes way too long to read a single table but containing a large dataset (dozens of millions of lines).

We're using data partitionning when reading data from Oracle and whiting to 5 parquet files, the pipeline is set with 4 DIU and degree of parallelism to 4.

Already checked any conversion issues with decimal data types; there's no high precision data in the dataset.

CPU and memory of the host server look fine on running one pipelie but spikes to 100% if running 5 jobs in parralel.

For information, the host server is a high capacity server.

Comparing to SQL server hosted on Azure VM and running through a simular setup, the performance drop from Oracle is drastic.

Any hints on where to look for the rootcause?

Thank you in advance,

Best regards

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
{count} votes

1 answer

Sort by: Most helpful
  1. ShaikMaheer-MSFT 38,546 Reputation points Microsoft Employee Moderator
    2023-08-23T10:03:37.76+00:00

    Hi Manon G,

    Thank you for posting query in Microsoft Q&A Platform.

    There could be several reasons why the copy activity from Oracle to ADLS2 is taking longer than expected. Here are some possible causes and suggestions which I am listing down. Please check them.

    Network latency: If the Oracle database is hosted on-premises or in a different region than the ADLS2 storage account, network latency could be a factor in the slow performance. You can try running the pipeline from a VM in the same region as the storage account to see if this improves performance.

    Data volume: If the table being copied contains a large amount of data, this could be a factor in the slow performance. You can try partitioning the data further or increasing the number of DIUs to see if this improves performance.

    Data type conversion: If there are data type conversions happening during the copy activity, this could be a factor in the slow performance. You can try optimizing the data type conversions or eliminating them altogether to see if this improves performance.

    Parallelism: If the degree of parallelism is set too low, this could be a factor in the slow performance. You can try increasing the degree of parallelism to see if this improves performance.

    Resource utilization: If the CPU and memory utilization on the host server are spiking to 100%, this could be a factor in the slow performance. You can try optimizing the resource utilization by tuning the pipeline settings or increasing the resources available to the host server.

    Database performance: If the Oracle database is experiencing performance issues, this could be a factor in the slow performance. You can try optimizing the database performance by tuning the database settings or running performance tests to identify any bottlenecks.

    Data format: If the data is in a format that is not optimized for Parquet, this could be a factor in the slow performance. You can try optimizing the data format by converting it to a more efficient format or using compression to reduce the data size.

    By troubleshooting these potential causes, you should be able to identify the root cause of the slow performance and take steps to improve it. Hope this helps. Please let me know if any further queries.


    Please consider hitting Accept Answer button. Accepted answers help community as well.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.