Data Loading Delay from Azure SQL DB to Cosmos DB Containe

Mahesh Sanga 0 Reputation points Microsoft Employee
2024-11-29T02:31:24.8633333+00:00

We are experiencing an issue when loading data from Azure SQL to Cosmos DB. The process is extremely slow; for example, loading just 5 million rows takes around 3.5 hours. Additionally, we are unable to load data into a hierarchical partition-enabled container using Azure Data Factory (ADF).

Any insights or suggestions would be greatly appreciated.

Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,686 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,958 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 27,051 Reputation points
    2024-12-01T17:38:38.3766667+00:00

    What factors are contributing to the slow data transfer?

    The data transfer rate from Azure SQL to Cosmos DB could be influenced by various factors, such as network latency, the configuration of the ADF pipeline, or the write operations' throughput limits in Cosmos DB. Review the Request Units (RUs) allocated to the Cosmos DB container. A low RU allocation could bottleneck the write operations. Additionally, ensure that the integration runtime in ADF is optimized for the region of both the source and target databases.

    Have you optimized the schema for Cosmos DB? When using hierarchical partitioning, it's crucial to design the schema and partition keys to align with the data's access patterns. Poor partitioning strategies can lead to uneven data distribution, causing hotspots and delays. Consider revisiting your partitioning strategy and consult Cosmos DB's Partitioning Design Guide for best practices.

    Are there alternative data ingestion methods to improve performance? Instead of relying solely on ADF, you could explore other tools such as Azure Functions, Spark with Cosmos DB connectors, or bulk executor libraries that might handle bulk data writes more efficiently. The Cosmos DB Bulk Executor Library is particularly useful for large-scale data ingestion and might address your performance concerns.

    Is your ADF pipeline configured for efficiency? Review the pipeline's performance settings. Ensure that batching is enabled and that the batch size aligns with the throughput capacity of your Cosmos DB container. Adjust the parallelism and retry settings in the copy activity to better manage the data flow. More guidance is available in the Azure Data Factory Performance and Scalability Guide.

    Have you explored ADF limitations with hierarchical containers? If you find ADF not able somehow in loading data directly into hierarchical partition-enabled containers, you might need to preprocess the data into a compatible format or use an intermediary staging step. This can be done by reshaping the data into a flat structure before ingestion or leveraging a tool that supports hierarchical writes. For further assistance, check out the Azure Cosmos DB and ADF Integration Documentation.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.