Investigate if the SAP HANA database itself is a bottleneck. This could be due to query performance, indexes, or how data is structured and stored in SAP HANA. Optimizing queries or adding indexes might help. The Standard D4s v5 VM used for SHIR might not be sufficient for your workload. Even though you have not observed CPU and RAM issues, it's possible that the VM is not powerful enough for the data volume and complexity. Consider upgrading to a more powerful VM. You mentioned that the maximum number of open connections to SAP HANA is always 4 and the maximum number of used DIUs is 2. This limitation could be due to configuration settings in Azure Data Factory or limits within SAP HANA. Investigate if there are any configurations or limits that can be adjusted to increase these numbers. The copy activity’s performance in Azure Data Factory can be optimized by fine-tuning the settings such as batch size, parallel copy operations, and retry policies. Experimenting with different configurations might yield better results. Increasing DIUs is a good approach, but it needs to be balanced with the capabilities of the source and destination systems. Overallocating DIUs can lead to underutilization and bottlenecks elsewhere.
How to improve extraction performance of SAP HANA Azure Data Factory Connector
Hi Experts,
with Azure Data Factory I am carrying out tests on reading data from a SAP HANA database and storing it in Synapse tables but I am noticing disappointing performance. An example: using Polybase, I have 3,492,246 rows x 184 columns transferred end-to-end in 16 minutes, 12 of which are just pulling from SAP HANA and writing to the staging repository.
Consider that the "Physical partitions of table" flag is enabled and the SHIR is on a fully dedicated Standard D4s v5 (4 vcpus, 16 GiB memory), with a limit of 16 concurrent jobs.
I tried many parameter combinations:
- Increase Packet size (KB) up to 20960
- Increase the maximum data integration units
- Increase the degree of copy parallelism
- Increase the SHIR concurrent jobs limit
- Disable performance metrics analytics
but the final result is always almost the same, in fact sometimes it gets worse.
I also noticed that the maximum number of open connections to SAP HANA is always 4. Just as the maximum number of used DIUs in the "Blob Storage -> Synapse Analytics" transfer is always equal to 2 and the number of used parallel copies is always 1.
Do you have any idea what can cause such poor performance? May it depend on the SHIR VM (although I have never seen it in difficulty in terms of CPUs and RAM during flows). What can I try to investigate further? Am I the one who has too high expectations?
Thank you very much in advance for your feedback
Luca
Azure Data Factory
1 answer
Sort by: Most helpful
-
Amira Bedhiafi 34,101 Reputation points Volunteer Moderator
2024-01-21T12:41:46.9633333+00:00