Azure CosmosDB Consistency - Understanding Bounded Staleness

Question

Azure CosmosDB Consistency - Understanding Bounded Staleness

Biju Mathew 481

Hi ,
I am new to AzureCosmosDB and struggling to understand the Bounded Staleness Consistency level.

https://learn.microsoft.com/en-us/azure/cosmos-db/consistency-levels

What does the lag in time and updates really mean?

Referring to this link in the Bounded Staleness section, I also need some help understanding this:

Inside the staleness window, Bounded Staleness provides the following consistency guarantees:
1.Consistency for clients in the same region for an account with single write region = Strong
2.Consistency for clients in different regions for an account with single write region = Consistent Prefix
3.Consistency for clients writing to a single region for an account with multiple write regions = Consistent Prefix
4.Consistency for clients writing to different regions for an account with multiple write regions = Eventual

Please can someone explain what the above lines mean.

1. Consistency for clients in the same region for an account with single write region = Strong
Is this a scenario where the Read and Write region is the same??
2. Consistency for clients in different regions for an account with single write region = Consistent Prefix
Is this where Reads are from a different region than write?

My understanding is in a multi write setup, the client will always write to the nearest single region. then Why is 3 and 4 talking about writing to different regions?

Is there any simpler explanation of this out there that i can refer to? a demo video perhaps?

Any assistance is deeply appreciated.

Thanks

Accepted answer

2 additional answers

Your answer

Answer 1

The lag in bounded staleness refers to rate in which data is replicated to secondary regions in a distributed database. Unlike other relaxed consistency models, bounded staleness enforces the amount of time or number of updates in which data between the write region and secondary replicas are not consistent. When the data approaches the staleness window, bounded staleness will throttle the number of writes in order to allow replication to catch up.

The consistency guarantees are written this way because the behavior appears different depending on where the readers are and where the writers are.

For #1, this applies to scenarios where both the writer and readers are in the primary write region. (The way this is worded is a bit confusing. Will get that updated). Another thing to understand about this scenario. Reads for bounded staleness are a minority quorum (2 replicas). This is done to ensure the most recent data is read in the write region. When data is written it is a majority quorum (3 replicas). A replica set is 4 replicas in a region. When data is read from the two replicas, the LSN for each replica is compared. If they match the data is guaranteed to be the most up to date and is returned. If they do not match, the replica with the higher LSN is returned because it is the most up to date. This is great for ensuring consistent data, but the cost of this is 2x that of Session or weaker consistency because you are reading from two replicas. (PS, this is the way Strong consistency works as well for reads. The difference for Strong consistency is that data is not committed until it is written to every regional replica. This is why write latency is so slow when using strong consistency).

For #2, that is correct. When the writer is in one region and the reader in another, the reader will see consistent prefix (reads are in the order in which they were written).

For #3 and #4, in a multi-region write scenario, you can have more than one client instance reading and writing in a region. When in the same region they will see consistent prefix, when in different regions they will get eventual consistency.

There are some videos which may help here. Key to understanding consistency models is understanding CAP and PACLC theorem. The segment in this presentation from BUILD in 2019 does a good job explaining how distributed databases handle all this. https://youtu.be/fOQoQnQqwwU?t=1133

And this is a short video explaining each of the consistency guarantees. https://www.youtube.com/watch?v=t1--kZjrG-o

Mark Brown - MSFT 2,771 Reputation points Microsoft Employee

2020-10-05T15:36:52.557+00:00

Local Minority = 2 replicas
Local Majority = 3 replicas
Global Majority = 3 replicas in all configured regions.

Your understanding for #3 and #4 is correct.
Karl Gardner 195 Reputation points

2024-07-09T03:30:46.36+00:00

Hello @Mark Brown - MSFT ,

thanks for this explanation. I see this in the documentation: "If the data lag in a region (determined per physical partition) exceeds the configured staleness value, writes for that partition are throttled until staleness is back within the configured upper bound". Wondering what does throttling mean in this case? So if we write to region 1 and then the reading in region 2 exceeds the staleness time, then the writes in region 1 are just cut off? Can you explain the throttling a bit?

Thanks!

Answer 2

Thanks a lot @Mark Brown - MSFT . your response explains a lot. Thanks for offering to update the document. May i also suggest a deepdive document as in a tech whitepaper be considered please.

A couple of items please, if I may.

Can I confirm on the quorum?
when we say Local Minority, it is 2 replicas,
For Local majority, it is 3 replicas
for Global Majority it is 3 replicas within each of the configured region

I want to confirm i understand #3 and #4 above please.

3.Consistency for clients writing to a single region for an account with multiple write regions = Consistent Prefix
As per your response this is applicable in a multi-master setup, this is what i understand: if multiple clients update a given record in one region ,then the readers in other region will see consistent prefix ? Readers in the same region will see the latest values?

4.Consistency for clients writing to different regions for an account with multiple write regions = Eventual
Can i confirm this is when say a given record is updated by multiple users who are in different regions, then the readers get eventual consistency?

Appreciate your response. thanks.

Answer 3

Grigoriev, Nikolai 1

Then what is the equivalent of Cassandra's LOCAL_QUORUM writes/reads in CosmosDB? C* can achieve strong consistency in the same DC while not imposing the penalty of cross-region communications. So, I can have two sets of clients working on their data with strong consistency in two regions while, of course, dealing with eventual consistency in case of accessing the data written in the remote DC.

Share via

Azure CosmosDB Consistency - Understanding Bounded Staleness

2 additional answers

Your answer