Hi Tomas Blažauskas,
Thanks for reaching out to Microsoft Q&A.
Looking at your current pipeline configuration, there are a few areas where you could potentially enhance throughput for the copy activity between Table storages.
Here are a few suggestions that might help improve performance based on your setup:
- Consider lowering the
writeBatchSize
if you observe throttling at the sink. Table Storage has a limit on entity group transactions, so if you consistently hit these limits, it could cause slowdowns. Reducing the batch size to around 100 entities per request might help. - You can also experiment with increasing
maxConcurrentConnections
beyond 16 if your table storage can handle more concurrent requests without throttling. Try adjusting this to 32 or 64 and observe the impact. - Since you have the concurrency set to 16, your current parallel copies count is aligned. However, increasing
parallelCopies
could be beneficial if your data is partitioned effectively. You can try increasing it to 32 or 64, depending on how much the sink can handle in parallel without hitting throttling issues. - Ensure that your source data is well-distributed across partitions. If most of the writes are concentrated on a few partitions, it may cause bottlenecks at the Table Storage level. If possible, re-distribute data across more partitions for more balanced throughput.
- Increase the number of DIUs to 8 or 16 for greater parallel processing power. DIUs affect the amount of compute power available for each pipeline, which could help if you're seeing any resource limitations.
- In some cases, using an intermediate staging location (such as blob storage) might improve the overall performance, especially if direct transfers between Table Storage are hitting bottlenecks. This would allow you to copy data to a faster medium first and then to the sink in smaller chunks. However, this could add overhead, so it’s something to test.
- Since throttling can cause intermittent slowdowns, it may be useful to increase the retry count to 3–5 with a small interval (
retryIntervalInSeconds
: 30–60) to help mitigate transient issues due to throttling or timeouts during sink writes. - The concurrency setting is aligned with your
parallelCopies
andmaxConcurrentConnections
, but if your system can handle more parallel operations, you could try increasing this to 32 or more. - The
batchCount
setting for the ForEach loop controls how many tables are copied concurrently. Since the pipeline is not sequential, increasing this value (ex: to 20 or more) can allow more tables to be processed in parallel if your Table Storage accounts and integration runtime can handle the load. - Use azure storage metrics to monitor if you’re hitting table storage throttling limits. If so, further reduce the
writeBatchSize
or spread out the load by adjusting the concurrency or introducing delays between batch operations.
Note: You can try these suggestions to narrow down the issues and increase the throughput performance.
Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.