Does AllowBulkExecution option for Cosmos client work with Parallel.ForEachAsync

Debashis Jena 71 Reputation points
2024-02-15T14:49:00.11+00:00

With our Web API application we are trying to upload 3500 records to cosmos DB. We are using using Parallel.ForEachAsync method for this. When we're setting AllowBulkExecution option to true in Cosmos Client it is taking 18 sec to insert 3500 records in DB and without AllowBulkExecution it is taking 20 sec. Is it recommended to use AllowBulkExecution option in Cosmos Client for Parallel.ForEachAsync method or it only works with Task.WhenAll?

ASP.NET Core
ASP.NET Core
A set of technologies in the .NET Framework for building web applications and XML web services.
4,130 questions
Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,434 questions
C#
C#
An object-oriented and type-safe programming language that has its roots in the C family of languages and includes support for component-oriented programming.
10,181 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Sina Salam 3,081 Reputation points
    2024-02-16T11:37:39.8633333+00:00

    Hi @Debashis Jena

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    You were asking about the performance issues when using Parallel.ForEachAsync to upload 3500 records to Cosmos DB and if to use AllowBulkExecution option in Cosmos Client for Parallel.ForEachAsync method or it only works with Task.WhenAll.

    In your scenario, you're using Parallel.ForEachAsync to upload 3500 records to Cosmos DB. This method distributes work across multiple tasks, which can potentially benefit from bulk execution if the SDK is allowed to batch operations together.

    The fact that you're seeing similar performance with and without AllowBulkExecution suggests that either the operations are not being effectively batched, or the overhead of batching is negligible compared to the overall operation time.

    It's generally recommended to use AllowBulkExecution when performing bulk operations, as it can improve performance by reducing the number of round trips to the database. However, its effectiveness can depend on various factors such as the size of the documents being inserted, network latency, and the characteristics of the Cosmos DB instance.

    About your question, whether AllowBulkExecution only works with Task.WhenAll, that's not necessarily the case. AllowBulkExecution affects how individual requests are handled by the SDK, regardless of whether they are executed concurrently using Parallel.ForEachAsync, Task.WhenAll, or any other asynchronous mechanism.

    You might consider experimenting with different batch sizes and profiling the performance of your application under various conditions to optimize performance further. In addition, ensure that your Cosmos DB instance is properly provisioned to handle the expected workload.

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.

    Please remember to "Accept Answer" if answer helped, so that others in the community facing similar issues can easily find the solution.

    Best Regards,

    Sina

    0 comments No comments

  2. Oury Ba-MSFT 16,076 Reputation points Microsoft Employee
    2024-02-20T19:32:00.87+00:00

    @Debashis Jena Thank you for reaching out. Bulk refers to scenarios that require a high degree of throughput, where you need to dump a big volume of data, and you need to do it with as much throughput as possible. My understanding here is you are only working with 3500 records. BulkMode enables IO optimization when-ever possible for a related group of operations. For large ingestion of documents in-general Bulk mode performs better than individual writes/CreateItem calls.  For bulk insertion scenario, in-general network IO will be the bottleneck. .NET usage model of parallel.forwach vs explicit task management is application choice. Choice needs to be evaluated based on application needs (parallism etc...).  Some common aspects of foreach are covered at https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/potential-pitfalls-in-data-and-task-parallelism

    There is a great talk on this https://www.google.com/search?q=Azure+comsos+Bulk+Matias&rlz=1C1RXQR_enUS981US981&oq=Azure+comsos+Bulk+Matias&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIJCAEQIRgKGKAB0gEINzY3MmowajeoAgCwAgA&sourceid=chrome&ie=UTF-8#fpstate=ive&vld=cid:3cdcf15e,vid:TPtGhQY1pZQ,st:392Regards, Oury