Azure SQL: Publish a feed of committed INSERTs with low latency without reading past values used in active transactions

dagss 41 Reputation points
2021-11-24T09:03:15.54+00:00

I'm seeking input / experiences on how to solve the problem of publishing a feed of events (INSERTs into a given table) from a SQL database.

We want to implement an "event feed" so that subscribers can listen to new data. Similar model to Azure Event Hub; but of course Event Hub isn't transactionally consistent with our SQL data and thus not suitable for our OLTP business logic. We will have a worker that queries for top(1000) new rows at the time and publishes these to Event Hub, but we also we want to provide APIs from a backend reading from the SQL database for loading events from any point in time.

I.e., we are looking for something like this (or a totally different way of achieving the same thing):

select top(1000) * from MyEvent
where
    EventNumber > @LastEventNumberLoaded
    and EventNumber < minimum_EventNumber_in_active_transactions()
order by EventNumber

(EventNumber could also be Timestamp, or anything really that lets us do this).

The second condition is quite critical, otherwise a reader that always reads the latest data will easily skip data from currently active transactions.

changetable()?

First thing we tried is using changetable(). But, it seems that changetable will simply load all changes since the cursor value and then sort them -- the longer the time window the slower the query. The latency can be many many seconds, which is totally unsuitable for our usecase.

row_version?

The second idea we had is using row_version. There is a min_active_rowversion() function that does exactly what we need in this situation, making sure that readers don't read too far. If "EventNumber" in the query above has the type "rowversion" then we can simply use min_active_rowversion() directly.

There are two disadvantages with this however:

a) The rowversion numbers are not under our explicit control. If we ever need to migrate data to another database or similar then the rowversion counter will be reset; if we ever need to correct events without publishing them again on the feed there is no way to do it.

We found some ways to work around this by using an auxiliary table, however...

b) The min_active_rowversion() is global for the database. So any writers to the database can stop event consumers from reading new events for long periods if they open a transaction that writes to another table with a rowversion column and then take a long time before rolling back or committing the transaction.

Ideally we would have something like min_active_rowversion(), but specific to a single table so that only active transactions writing to that table would block a specific set of readers.

Utilize sys.dm_tran_...?

What we are looking into currently is devising our own scheme using tables where transactions publishes what EventNumber they use, and that readers compare this value with the currently active transactions read from sys.*. I will post a more specific question about this as a seperate question.

This looks promising, but will have a distinctive DIY feel to it; I'm hoping there is a more standard way.

Processing transaction log

Processing the transaction log and assigning EventNumber during that processing. But it doesn't seem that such features are available in Azure SQL, only Managed Instance? We want to use Azure SQL...

Other ideas?

Has anyone already done this? How did you solve it?

Azure SQL Database
{count} votes