Cosmos DB High availability Concerns with Multi Region

Mathew James 421 Reputation points
2022-07-05T03:54:32.863+00:00

Hi Everyone -

Although I read several documentation, still I do have some concerns regarding Cosmos db HA with multi regions.

  1. At present I do have Cosmos db in west us2 region which serves as main db for my Application. West US2 will be the Active region.
  2. We are planning to set up passive region for High Availability in say East US. In that I am planning to set my cosmos db ( as read region) utilizing Service-Managed failover in the Primary region.

Here are my Concerns :-

  1. By Enabling Replicate data globally, I understand that any writes happened in Primary region will be replicated automatically in the Secondary region cosmos db as well. (Can I know is this replication almost like instantaneous ?)
  2. Enable Automatic Failover - Does this mean - Even If my .Net Core Application in primary region is up and running, but my Primary region Cosmos db is down, will this automatic failover connect my .Net Core application in Primary region to Cosmos DB in secondary region ?? (Does this mean I don't need to do anything programmatically to point the connection string to secondary region cosmos db from my application ?)
  3. Also when the secondary region Cosmos db is not enabled with multi region write, Does that mean when the Primary region Cosmos db is down and when Automatic failover switched to Secondary region Cosmos db, the application cannot write to this Secondary region Cosmos db ?
  4. I am planning to use a Azure Front door Premium for High Availability (its for internal application with private endpoints and hence the Premium). So by any chance if Primary region is Down, and if Front switches to Secondary region - In this case, Does my .Net Core application in Secondary region can only read and cannot write to secondary region Cosmos db (if the Cosmos db is not enabled with multi-region write ?)

Appreciate your response..!!

Thanks in Advance!!
-Mathew James

Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,902 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Mark Brown - MSFT 2,771 Reputation points Microsoft Employee
    2022-07-05T17:27:39.157+00:00

    Answers to your questions/concerns below.

    1. There is nothing really "instantaneous" in a distributed database. The speed data is replicated is governed by the consistency level chosen and CAP/PACLC Theorem. If you require data to be perfectly consistent (RPO=0) within a distributed database, choose strong consistency. However, you will pay the price for this in higher write latency as strong consistency requires data be committed globally in all configured regions synchronously before being recognized as committed. All other consistency levels are asynchronous with higher RPO but provide lower write latency. If you're new to this type of database I'd recommend reading our docs on consistency (link above) and global distribution as these go into very deep detail on how a distributed database behaves nominally, as well as during different types of failures.

    2. The behavior of the service during an outage is different depending on the type of outage and how or where your account is configured. For details on how the service behaves across different types of outages, see our High Availability docs here.

    3. When an account is configured for single-region write and the primary (read/write) region loses availability, write availability is impacted for that account until the region is marked off-line and failed over to the secondary region. More details on this behavior can be found in the High Availability doc above.

    4. Same answer as above. When the primary region for an account configured for single-region writes experiences availability loss, writes are impacted. When using multi-region writes, the Cosmos DB SDK will fail over and write to a secondary region (RTO=0). However, depending on the type of outage, there may be data which is not fully replicated. During any regional outage, users should check the conflict feed to see if any data was not replicated when the outage occurred. Also, when using multi-region writes, depending on the type of conflict resolution configured, checking the conflict feed may be necessary under normal conditions as well if two clients attempt to write/update the same data simultaneously. For more information see, Manage Conflict Resolution policies in Cosmos DB.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.