CosmosDB Eastus2Euap zone resilience during AZ drill

Tarek Aljabban 0 Reputation points Microsoft Employee
2025-03-18T19:28:55.1766667+00:00

During AZ DR Drill conducted in Eastus2Euap, our service starts failing because it has a regional CosmosDB dependency, and since we don't have CosmosDB AZ capacity in Eastus2Euap, the calls to CosmosDB start failing during the drill window. This makes the drill exercise of little value to us as we cannot be resilient to these failures with no AZ redundancy.

Any suggestions on actions we can take from our end to maintain resilience during the drill?

Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,839 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Prasad Chaganti 410 Reputation points Microsoft External Staff
    2025-03-19T00:45:37.21+00:00

    Hi Tarek Aljabban,

    It seems you're facing a major challenge During AZ DR Drill because of the absence of availability zone (AZ) redundancy for CosmosDB in the Eastus2Euap region. Here are some recommendations to help maintain resilience during the drill:

    One of the most effective ways to ensure high availability and resilience is to enable multi-region replication for your CosmosDB. This allows your data to be replicated across multiple regions, ensuring that your service remains available even if one region fails.

    Azure CosmosDB provides a manual failover API that allows you to simulate a regional outage and test your application's resilience. By setting up your application to manage failovers smoothly, you can ensure it remains operational even during a regional outage.

    Ensure that your application has robust retry logic to handle transient failures. This can help mitigate the impact of temporary connectivity issues during the drill.

    Use Azure Traffic Manager to route traffic to the nearest available region. This can help ensure that your application remains available even if one region is experiencing issues.

    Regularly review and optimize your CosmosDB configuration to ensure that it is set up for high availability and resilience. This includes configuring appropriate consistency levels and partitioning strategies.

    For more detailed information, you can refer to the below Documents:

    https://learn.microsoft.com/en-us/azure/reliability/reliability-cosmos-db-nosql

    https://learn.microsoft.com/en-us/azure/cosmos-db/distribute-data-globally

    These steps should help you maintain resilience during your Azure DR Drill and ensure that your service remains available even in the event of regional failures

    Hope this helps. Do let us know if you any further queries.

    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.