ADF process huge amount of files in data lake

Ronald Mussche 36 Reputation points
2023-01-20T13:00:09.7666667+00:00

Hi,

Everyday we receive events and write this to the data lake and we merge all the single files to one file for performance. Then we move the files to an archive folder. Now in the incremental load there is something missing and we need to do a full load again. The archive folder has more then 10M files in there. The copy activity is running for 2 days now and it processed 6M files with a throughput of approx. 9KB/s. This is in my opinion way to slow. I tried to tweak the "Degree of copy parallelism" and "Maximum data integration unit" to a higher number but it does not do anything. The source is using a wildcard filter to filter files with a specific name.

User's image

Already read this post but there is no proper solution:
https://learn.microsoft.com/en-us/answers/questions/919689/very-low-throughput-in-adf-copy-activity

Can anyone help me with this?

Thanks

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,426 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,196 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. HimanshuSinha-msft 19,386 Reputation points Microsoft Employee
    2023-01-23T22:16:22.51+00:00

    Hello @Ronald Mussche , Thanks for the question and using MS Q&A platform.

    I will agree the copy activity is really behaving very slow. I can suggest you few things to check .

    1. I see that the throttling error is 10 in the snapshot. I am not sure as to why that's the case. If you have access to the container, i suggest you check that. You can read this article and it may help.
      [https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-scalable-app-verify-metrics.
    2. Also, how things work if you copy the file to a different container? if it works faster that will mean that something is not set write on the Archive folder.
    3. Also, can you tell me how many files in that folder?

    Thanks Himanshu

    Please Upvote and Accept as answer if the reply was helpful, this will be helpful to other community members.