Breyta

Deila með


Optimize performance for bulk operations

When you need to create or update thousands or millions of records in Dataverse, the choices you make can save hours of time for the bulk operation project to complete. This article describes the factors that affect performance and the options you have to build client applications that optimize performance for bulk operations. Consideration should also be given to other factors, such as the number of records, record size, network latency, and data complexity.

Table type

The type of table you choose to store your data has the greatest impact on how much throughput you can expect with bulk operations. Dataverse offers two types of tables: standard and elastic.

  • A standard table stores data using Azure SQL. Standard tables provide transaction support and greater capabilities for modeling relationships.
  • An elastic table stores data using Azure Cosmos DB. Elastic tables automatically scale horizontally to handle large amounts of data and high levels of throughput with low latency. Elastic tables are suitable for applications that have unpredictable, spiky, or rapidly growing workloads.

If data load times are your primary concern, elastic tables provide the best performance. Learn about when to use elastic tables

Business logic

Dataverse provides the capability to add extra business logic when records are created or updated by using plug-ins. Plug-ins can be registered to run synchronously before or within the transaction of a standard table operation. Elastic tables support plug-ins that can execute before an operation starts because there's no transaction. Any error that occurs in the plug-in code during a synchronous step for a standard table, or before the operation in an elastic table, cancels the operation. A plug-in developer can deliberately throw an exception to cancel the operation to ensure data validation logic is applied.

Any plug-in that is registered to run synchronously increases the time for the operation to complete. To preserve performance:

  • Limit the number of synchronous plug-ins that are registered for the operations. Add logic to run asynchronously by using an asynchronous plug-in or Power Automate flow unless it's essential the logic be applied synchronously.
  • Ensure the plug-ins you have are limited in the logic that they attempt to perform. While plug-ins must complete within a generous two-minute time limit, synchronous plug-ins that exceed two seconds to run seriously degrade performance.
  • Ensure plug-ins only run when necessary. Filter plug-ins for update operations to run only when specific columns are updated.
  • Make sure that plug-ins are optimized to perform logic as efficiently as possible. For standard tables, you need to consider the impact that transactions and record locks may have on performance. Learn about scalable customization design and other best practices for writing plug-ins.
  • Choose which API to register your plug-in on. You can apply synchronous logic to run on the more efficient CreateMultiple and UpdateMultiple bulk operation APIs. Learn how to write plug-ins for CreateMultiple and UpdateMultiple.

Bypass business logic

To speed up the bulk operation project, disable plug-ins registered for the create or update operations to improve performance. If the business logic isn't essential, or if you plan other steps to ensure eventual data consistency, manually disable the plug-in steps and re-enable them when the bulk operation project is complete. However, disabling plug-ins disables the logic from being applied from any client. Any user or other process adding data to Dataverse during this period won't have any of the business logic applied.

As a developer of a client application performing the bulk operation, you can apply an optional parameter to the requests you send to bypass logic. Only a system administrator, or users who are granted a specific privilege, can use this header. Learn more about how to bypass custom Dataverse logic.

Bulk operation APIs

Dataverse provides bulk operation APIs that enable the highest possible throughput for create and update operations. These APIs include CreateMultiple, UpdateMultiple, and UpsertMultiple. For elastic tables only, you can use DeleteMultiple.

While these APIs provide the highest throughput, they have the following limitations for standard tables:

  • Not currently available for all tables. Any custom table supports them, but not all the core Dataverse tables support them, such as Account or Contact. You can run a query provided in the documentation to determine whether a table can use these APIs.
  • Not forgiving of data errors. You need to make sure that the data you're changing is carefully scrubbed and validated. Any error that occurs within one operation in these APIs causes the entire operation to fail.
  • Not supported to use in plug-ins. Currently, only external client applications should use these APIs.

Bulk operations are available for all elastic tables, and elastic tables can return information about individual operations that fail. Learn more about bulk operations with elastic tables.

Batch APIs

If you can't use bulk operation APIs, use ExecuteMultiple with the SDK for .NET, and use OData $batch with Web API.

Use these APIs to group a set of operations in a single request and achieve greater efficiency primarily due to fewer, larger requests reducing the total payload sent and received over the wire for each operation. A client application doesn't need to wait for an operation to finish before sending the next request.

Each operation within the request is applied sequentially on the server, so there's no improved efficiency per operation. However, because the operations are done individually, you can get information about which operations failed, or stop the batch when an error occurs. You can send up to 1,000 operations per request, but for best results, start with a smaller number and experiment to determine what size batch works best for your case.

Note

Both bulk operation and batch APIs see significant performance gains when used in parallel. See Parallel requests.

Client architecture

Dataverse is designed as a data source to support multiple applications with large numbers of concurrent users. To optimize throughput, design your client to use Dataverse's strengths.

Bottlenecks in client-side code are the primary cause of performance problems. Developers frequently fail to fully use the capabilities of the code, which can affect performance. It's crucial to optimize how the client application utilizes the infrastructure's cores or compute, as failure to optimize can significantly affect performance. For example, when using Azure Functions, take extra steps to optimize performance, such as implementing autoscaling, using warm-up instances, adjusting CPU usage, utilizing multiple cores, and allowing concurrency.

Service protection limits

To ensure consistent availability and performance for everyone, Dataverse applies some limits to how APIs are used. These limits detect when client applications make extraordinary demands on server resources. Bulk operation projects always make extraordinary demands, so you need to be prepared to manage the errors that service protection limits return. If you aren't getting some service protection limit errors, you haven't maximized the capability of your application.

Service protection limit errors are just another kind of transient error that your client should be prepared to handle, like a temporary loss of network connectivity. A resilient client application must respond to the error by waiting and retrying. The only difference is that service protection limits tell you how long you need to wait before retrying.

To learn more, see:

Parallel requests

You can see a significant improvement in throughput by sending requests in parallel, but you need to understand how to send them correctly.

Not all environments are the same

Not every Dataverse environment has the same number of web server resources allocated to it. Dataverse scales to the need of the environment by adding more web server resources to support it. A production environment supporting thousands of active users requires more web servers than a trial environment. When your environment has a lot of web servers, sending requests in parallel can make a dramatic difference in the total throughput your client application can achieve.

Dataverse returns data in a response header that tells you a recommended degree of parallelization (DOP) for your environment. Performance worsens if you send more parallel requests than the response header recommends. The client hardware you use to run your application might need more CPU cores to send this many requests in parallel. You might need to use more clients to get maximum throughput. For example, you might use a scaled-out app service or Azure function.

Depending on your client-side architecture, you might need to split the recommended degree of parallelism. For example, when you have two clients, and your recommended DOP is 50, configure each client to use 25.

Disable Azure affinity

When appropriate, you see best results when you configure your client to use all the available web servers by removing the Azure affinity cookie that tries to associate your application to a single web server. Disabling Azure affinity isn't appropriate for interactive applications that use cached data from the server to optimize the user experience.

Optimize your connection

When using .NET, apply configuration changes like the following to optimize your connection so your requests aren't limited by default settings.

// Bump up the min threads reserved for this app to ramp connections faster - minWorkerThreads defaults to 4, minIOCP defaults to 4 
ThreadPool.SetMinThreads(100, 100);
// Change max connections from .NET to a remote service default: 2
System.Net.ServicePointManager.DefaultConnectionLimit = 65000;
// Turn off the Expect 100 to continue message - 'true' will cause the caller to wait until it round-trip confirms a connection to the server 
System.Net.ServicePointManager.Expect100Continue = false;
// Can decrease overall transmission overhead but can cause delay in data packet arrival
System.Net.ServicePointManager.UseNagleAlgorithm = false;

Recommendation summary

Based on the factors previously described, follow these recommendations to optimize throughput for bulk operation projects:

  • Choose a table type that fits your requirements. Elastic tables have much greater capacity for bulk operations.
  • Minimize, disable, or bypass custom business logic on the tables you're using. Configure your client application to bypass custom logic when appropriate.
  • Use Dataverse bulk operation APIs when you can, otherwise use batch APIs.
  • Design your client application to manage transient errors, including those errors returned by service protection limits.
  • Send requests in parallel. Use the response header to guide you to the recommended degree of parallelism (DOP). Disable the affinity cookie when appropriate.
  • Validate the data to ensure it meets the table column schema. This validation helps prevent errors and reduces the number of failed operations.

See also

Elastic tables
Use plug-ins to extend business processes
Bypass custom Dataverse logic
Bulk Operation messages
Write plug-ins for CreateMultiple and UpdateMultiple
Send parallel requests