Hi Vaibhav Tiwari,
Thank you for contacting to Microsoft QA.
It looks like you're encountering a significant performance issue with Azure Data Factory (ADF) while trying to copy data from Oracle tables that include CLOB columns. Since you mentioned that the process takes nearly 2 hours for a 1 GB table, and switching the CLOB to string reduces it to about 2 minutes (though that's not a viable long-term solution), I’d suggest a few different approaches to try and improve the performance without losing data integrity.
Here are some things you can try:
- Assess Table Partitioning:
- Since you're already using dynamic range partitioning, ensure the underlying Oracle table has proper partitioning to take advantage of parallel copies. The more partitions available, the better performance you might see. Collaborate with your Oracle DBA to establish a partitioning strategy that aligns with your data access patterns.
- Utilize Parallel Copies:
- Increase the number of parallel copy activities in ADF. If you haven’t already, set the
parallelCopiesproperty on the copy activity to allow ADF to run simultaneous reads from different partitions of the Oracle source.
- Increase the number of parallel copy activities in ADF. If you haven’t already, set the
- Permissions for Partition Detection:
- Make sure that the ADF service has permissions to access the
DBA_TAB_PARTITIONStable in your Oracle database. If this access is not granted, ADF can't detect the partitions, which can severely limit parallelism.
- Make sure that the ADF service has permissions to access the
- Execution Metrics Review:
- Use the copy activity monitoring features to analyze where the time is being consumed during the copy process. Look for indications that the bottleneck is happening during data retrieval from Oracle.
- Custom Query with Dynamic Range Partitioning:
- If partitioning physically in your Oracle database isn't an option, consider writing a custom SQL query that mimics partitioning by using the
ORA_HASHfunction on your CLOB data to partition appropriately for the copy operation.
- If partitioning physically in your Oracle database isn't an option, consider writing a custom SQL query that mimics partitioning by using the
- Optimization and Tuning:
- Review the ADF performance troubleshooting documentation for additional tuning tips specific to your setup, including settings for the Integration Runtime and optimal Data Integration Units (DIUs) for your copy activity.
- Check Network and Resource Bottlenecks:
- Ensure that there are no network issues or resource bottlenecks on either end (the Oracle database or ADF), which could cushion the throughput.
Additional Documentation:
- Troubleshoot Performance Issues in Azure Data Factory Copy Activities
- Copy Data from and to Oracle by using Azure Data Factory
- Copy Activity Performance and Tuning
If after trying these recommendations you're still facing issues, it's a good idea to raise a case with Azure support for a deeper investigation.
Follow-Up Questions:
- Can you confirm the current partitioning strategy of your Oracle table? Is it partitioned on any columns?
- What Integration Runtime are you using for your ADF job?
- Could you provide any performance statistics or run metrics from Azure Data Factory for this operation?
- Are there any known network limitations between your ADF environment and your Oracle database?
Hope this helps you get closer to a solution! Let me know if you need further assistance!