Java bulk executor library: Download information
APPLIES TO: NoSQL
This is not the latest Java Bulk Executor for Azure Cosmos DB! Consider using Azure Cosmos DB Java SDK v4 for performing bulk operations. To upgrade, follow the instructions in the Migrate to Azure Cosmos DB Java SDK v4 guide and the Reactor vs RxJava guide.
On February 29, 2024 the Azure Cosmos DB Sync Java SDK v2.x will be retired; the SDK and all applications using the SDK including Bulk Executor will continue to function; Azure Cosmos DB will simply cease to provide further maintenance and support for this SDK. We recommend following the instructions above to migrate to Azure Cosmos DB Java SDK v4.
|Description||The bulk executor library allows client applications to perform bulk operations in Azure Cosmos DB accounts. bulk executor library provides BulkImport, and BulkUpdate namespaces. The BulkImport module can bulk ingest documents in an optimized way such that the throughput provisioned for a collection is consumed to its maximum extent. The BulkUpdate module can bulk update existing data in Azure Cosmos DB containers as patches.|
|Bulk executor library in GitHub||GitHub|
|API documentation||Java API reference documentation|
|Get started||Get started with the bulk executor library Java SDK|
|Minimum supported runtime||Java Development Kit (JDK) 7+|
- Fix retry policy when
GoneExceptionis wrapped in
IllegalStateException- this change is necessary to make sure Gateway cache is refreshed on 410 so the Spark connector (for Spark 2.4) can use a custom retry policy to allow queries to succeed during partition splits
- Fix an issue resulting in documents not always being imported on transient errors.
- Upgrade to use latest Azure Cosmos DB Core SDK version.
- Improve handling of RU budget provided through the Spark Connector for bulk operation. An initial one-time bulk import is performed from spark connector with a baseBatchSize and the RU consumption for the above batch import is collected. A miniBatchSizeAdjustmentFactor is calculated based on the above RU consumption, and the mini-batch size is adjusted based on this. Based on the Elapsed time and the consumed RU for each batch import, a sleep duration is calculated to limit the RU consumption per second and is used to pause the thread prior to the next batch import.
- Fix a bug preventing bulk updates when using a nested partition key
- Fix for DocumentAnalyzer.java to correctly extract nested partition key values from json.
- Add functionality in BulkDelete operations to retry on specific failures and also return a list of failures to the user that could be retried.
- Update for Azure Cosmos DB SDK version 2.4.7.
- Fix for 'mergeAll' to continue on 'id' and partition key value so that any patched document properties which are placed after 'id' and partition key value get added to the updated item list.
- Update start degree of concurrency to 1 and added debug logs for minibatch.