Thanks for sharing the details and the screenshot. Based on the information provided, it looks like your pipeline is copying a ~3.7 GB ZIP file from an SFTP source and decompressing it into over 120,000 files totaling ~28 GB in Azure Data Lake Storage Gen2. The entire process is taking approximately 3.5 hours, which is understandably longer than expected.
Here are a few recommendations and optimization suggestions to help optimize and reduce the copy duration:
Increase Parallel Copies - Currently, only 1 parallel copy is used. Increasing this to 4 or more in the Copy Activity settings can significantly speed up the file writing process to ADLS.
Use a More Powerful Integration Runtime (IR) - You're using 4 DIUs, which may be limiting throughput. Consider increasing the DIUs or using a more powerful Azure IR or a dedicated Self-hosted IR, especially if the workload is compute-intensive.
Pre-Decompress the ZIP File (If Possible) - Decompression inside the Copy Activity can be time-consuming. If feasible, consider unzipping the file before the copy process using an Azure Function, Logic App, or Databricks notebook.
Split Large ZIP into Smaller Ones - If you're able to control the ZIP file generation, splitting it into smaller ZIPs with ~5,000–10,000 files each can help parallelize the process and reduce latency.
Minimize Small File Writes - Writing 120,000+ small files can be slow due to metadata and file system overhead. Consider batching or consolidating smaller files if your downstream processing supports it.
Monitor Throughput - The current throughput is ~298 KB/s, which is relatively low. Verifying the network speed and source SFTP performance might reveal bottlenecks outside ADF as well.
I hope this information helps. Please do let us know if you have any further queries.
Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.
Thank you.