Working with very large XML files

Alekya Seemakurty, Sri 86 Reputation points
2024-02-07T17:05:02.7033333+00:00

I have a very large 1GB XML source files, that I am writing to a table using Azure data factory Data flow. It is running for more than 3-4hrs to write the source file to table. I tried partitioning at source and sink and used Round-robin with 50 partitions. Still it didn't help much. Can you help me with deal such huge XML files? User's image

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,625 questions
{count} votes

1 answer

Sort by: Most helpful
  1. QuantumCache 20,366 Reputation points Moderator
    2024-02-08T01:45:47.19+00:00

    Hello @Alekya Seemakurty, Sri

    What is the source location?

    • Did you try other Partition Type: Such as "Hash".
    • Increasing the number of partitions can also increase the overhead of managing the partitions.
    • Try increasing the degree of parallelism in your data flow.
    • Adjusting the number of partitions or experimenting with different partitioning strategies based on the data characteristics and processing requirements can help.

    Sharing few resources to read on similar topic., hope this helps!

    Performance Tuning ADF Data Flow Sources and Sinks

    ![thumbnail image 1 of blog post titled

    						Performance Tuning ADF Data Flow Sources and Sinks
    						
    					
    				
    		
    	
    
    		
    

    ](/api/attachments/65c7e25c-1245-4cbc-8d27-380422eee396?platform=QnA)

    Mapping data flows performance and tuning guide

    Monitoring data flow performance

    Data Flow Monitor

    Data Flow Monitoring

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.