Need Consistency levels in Azure Cosmos DB for Postgresql (like in NoSQL)

Konstantin Morozov 0 Reputation points
2024-04-17T14:14:28.9133333+00:00

Hi,

I came across with a data inconsistency in my Citus cluster, when I did classic money transfer test:

init.sql:

create table mytbl (id serial primary key, amount int);
insert into mytbl(amount) select 0 from generate_series(1, 1234567);
select create_distributed_table('mytbl', 'id');

psql "connection string" -f init.sql

upd.sql:

\set idA random(1, 1234567)
\set idB random(1, 1234567)

BEGIN;
update mytbl set amount = amount - 100 where id = idA;
update mytbl set amount = amount + 100 where id = idB;
END;

pgbench "connection string" -f upd.sql -T 300

while running pgbench run query select sum(amount) from mytbl; and get non zero results

Are there any plans to do something with this?

Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,443 questions
Azure Database for PostgreSQL
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. GeethaThatipatri-MSFT 27,417 Reputation points Microsoft Employee
    2024-04-17T21:32:03.0966667+00:00

    Hi, @Konstantin Morozov Welcome to Microsoft Q&A thanks for posting your question.

    It's a correct observation that running a zero sum on Citus could result in a non-zero answer, but only under concurrent load. Once the system settles the zero sum can be taken. This is a tradeoff we took in the development of Citus to favour speed over isolation guarantees in its default operation. This is a tradeoff that we have seen work best for most customers. Note that this is only a tradeoff in the zero sum calculation, and cannot cause double spending as long as applicable locks (eg. SELECT FOR UPDATE) are taken in a transaction while validating business logic. More details are explained in the docs: https://docs.citusdata.com/en/v12.1/develop/reference_dml.html?highlight=2pc#updates-and-deletion Many of such violations can be worked around in the application. Instead of keeping a balance you can change the application to create a ledger. The exact details are too much to discuss here, but should be looked into if your application is performing accounting.

    To work around these isolation issues there are two approaches that can be taken, and which one is the best depends on the application you are creating.

    1. Whenever you need a zero-sum you can pause the mutating workload and wait for all 2pc transactions to fully finish. Once the system is settled you can safely take the zero-sum which should reflect the state of the world.
    2. if pausing the mutating workload manually is impossible you can do this from the database by taking an aggressive lock on the table you are trying to zero-sum on. The effect to the application is the same, it will pause the mutating workload from the database. Your application needs to be lenient to this higher increase in latency. To take a lock like this you can use the postgres LOCK TABLE ddl command: https://www.postgresql.org/docs/current/sql-lock.html . When locking a table in SHARE mode you will prevent any concurrent mutations to that table, which allows you, once the lock is acquired, to perform a zero sum calculation.

    Be aware that from the moment you start acquiring the SHARE lock on the table, no new mutations will start. Once any ongoing transactions have finished you will have acquired the lock, still preventing mutations to happen. Throught the time you hold on to the share lock all mutations will be blocked. Once you have obtained the zerosum count you should immediately release the lock by finishing the transaction.

    Regards

    Geetha

    0 comments No comments