Data Loading Delay from Azure SQL DB to Cosmos DB Containe

Question

Data Loading Delay from Azure SQL DB to Cosmos DB Containe

Mahesh Sanga 0 Microsoft Employee

We are experiencing an issue when loading data from Azure SQL to Cosmos DB. The process is extremely slow; for example, loading just 5 million rows takes around 3.5 hours. Additionally, we are unable to load data into a hierarchical partition-enabled container using Azure Data Factory (ADF).

Any insights or suggestions would be greatly appreciated.

Chandra Boorla 14,510 Reputation points Microsoft External Staff Moderator

2024-12-02T02:34:16.1366667+00:00

Hi @Mahesh Sanga

Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!

Just checking in to see if the below answer provided by @ Amira Bedhiafi helped.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

1 answer

Your answer

Chandra Boorla 14,510 Reputation points Microsoft External Staff Moderator

2024-12-02T02:34:16.1366667+00:00

Hi @Mahesh Sanga

Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!

Just checking in to see if the below answer provided by @ Amira Bedhiafi helped.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer 1

What factors are contributing to the slow data transfer?

The data transfer rate from Azure SQL to Cosmos DB could be influenced by various factors, such as network latency, the configuration of the ADF pipeline, or the write operations' throughput limits in Cosmos DB. Review the Request Units (RUs) allocated to the Cosmos DB container. A low RU allocation could bottleneck the write operations. Additionally, ensure that the integration runtime in ADF is optimized for the region of both the source and target databases.

Have you optimized the schema for Cosmos DB? When using hierarchical partitioning, it's crucial to design the schema and partition keys to align with the data's access patterns. Poor partitioning strategies can lead to uneven data distribution, causing hotspots and delays. Consider revisiting your partitioning strategy and consult Cosmos DB's Partitioning Design Guide for best practices.

Are there alternative data ingestion methods to improve performance? Instead of relying solely on ADF, you could explore other tools such as Azure Functions, Spark with Cosmos DB connectors, or bulk executor libraries that might handle bulk data writes more efficiently. The Cosmos DB Bulk Executor Library is particularly useful for large-scale data ingestion and might address your performance concerns.

Is your ADF pipeline configured for efficiency? Review the pipeline's performance settings. Ensure that batching is enabled and that the batch size aligns with the throughput capacity of your Cosmos DB container. Adjust the parallelism and retry settings in the copy activity to better manage the data flow. More guidance is available in the Azure Data Factory Performance and Scalability Guide.

Have you explored ADF limitations with hierarchical containers? If you find ADF not able somehow in loading data directly into hierarchical partition-enabled containers, you might need to preprocess the data into a compatible format or use an intermediary staging step. This can be done by reshaping the data into a flat structure before ingestion or leveraging a tool that supports hierarchical writes. For further assistance, check out the Azure Cosmos DB and ADF Integration Documentation.

Chandra Boorla 14,510 Reputation points Microsoft External Staff Moderator

2024-12-03T01:21:39.7033333+00:00

@Mahesh Sanga

Just checking in to see if the below answer provided by @ Amira Bedhiafi helped.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

Data Loading Delay from Azure SQL DB to Cosmos DB Containe

1 answer

Your answer