is sql server smart enough to use indexed row versions efficiently in multi join queries?

Question

Hi we will be on 2019 in a few months. i dont know whether it will be enterprise or std.

i'm a big fan of row versions (cast as bigint) as a better choice than datetimes to be indexed so we can quickly identify in incremental extracts only those records that changed since last time we ran an an extract. pls see the attached code for 2 sample tables and the control table we use to keep track of the last high key encountered on each table. you can see the first 2 records being inserted to the control table. basically the most current records in this table are "ended" on each extract and 2 new records with new high keys are inserrted.

i'm looking at a relatively deep join right now and have to wonder how smart/efficient sql will be in extracting only the latest changes. in the example i show 2 tables but in reality its n tables that basically chain to each other thru various combos of pk's and pointers and we can assume the pointers are fks too. and for simplicity i'd like to assume they each table has a row version cast as bigint indexed.

the predicate in the join might look something like

where (research1.row_version_number > x and reserach1.row_version_number <= y) or
(research2.row_version_number > x1 and research2.row_version_number <=y2) or...

and the join to the 2nd table might look something like this...

select ...from research1 r1
join research2 r2
on r1.apointer=r2.pk...

so i'm wondering if sql will throw its hands in the air and apply the worst strategy in joining the n tables or try to apply some cool algorithm that even understands the nth table changed but the 1st didnt , yet the 1st table's "related" record needs to be extracted anyway. if the answer is yes, is there a term for what it will do in its plan?

USE [research]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[research1](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    pk uniqueidentifier,
    aPointer uniqueidentifier,
    [aFact] [int] NULL,
    row_version_stamp rowversion,
    row_version_number as cast(ROW_VERSION_STAMP as bigint),

 CONSTRAINT [PK_research1] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[research2](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    pk uniqueidentifier,
    aPointer uniqueidentifier,
    [aFact] [int] NULL,
    row_version_stamp rowversion,
    row_version_number as cast(ROW_VERSION_STAMP as bigint),

 CONSTRAINT [PK_research2] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO


CREATE unique NONCLUSTERED INDEX [IX_research1_rowversionnumber] ON [dbo].[research1]
(
        ROW_VERSION_NUMBER
)
WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [primary]
GO

CREATE unique NONCLUSTERED INDEX [IX_research2_rowversionnumber] ON [dbo].[research2]
(
        ROW_VERSION_NUMBER
)
WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [primary]
GO


USE [research]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[JobControl](
    [JobControlId] [int] IDENTITY(1,1) NOT NULL,
    [PackageName] [varchar](100) NULL,
    [ServerName] [varchar](16) NULL,
    [InstanceName] [varchar](16) NULL,
    [DatabaseName] [varchar](128) NULL,
    [SchemaName] [varchar](128) NULL,
    [TableName] [varchar](128) NULL,
    [HighKey] [varchar](100) NULL,
    [StartDate] [datetime2](7) NULL,
    [EndDate] [datetime2](7) NULL,
PRIMARY KEY CLUSTERED 
(
    [JobControlId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]
GO

declare @dt datetime2(7) = getdate()

insert jobcontrol
(
    [PackageName] ,
    [ServerName] ,
    [InstanceName] ,
    [DatabaseName] ,
    [SchemaName] ,
    [TableName] ,
    [HighKey] ,
    [StartDate] ,
    [EndDate]
)

select 'pkg1','server1',null,'research','dbo','research1','0',@dt,null
union
select 'pkg1','server1',null,'research','dbo','research2','0',@dt,null

Answer

at my request, robert biddle has commented on this chain at tom's link https://www.mssqltips.com/sqlservertip/3295/using-rowversion-with-sql-server-integration-services-to-reduce-load-times/ . i have a pending question to him there asking that he expound on "decoupled architectures".

Answer

thanks tom, i read Biddle's tip and its almost identical to what i do. I asked him to read the posts here and comment. erland's points make a lot of sense especially in the somewhat volatile data capture i'm looking at right now.

erland, i see now that change tracking does track inserts and deletes as well. its an interesting option. i like that the "uncommitted" item has to show up eventually. i like that you only need to specify the tables you want included in the tracking. i wonder if our oltp guys would freak that i'm suggesting we try a synchronous tracking feature on an app that probably cant afford to perform poorly. i like that it runs on std edition. i'm guessing that when my process wakes up every 15 minutes, and it needs to read 80k "updates" per table (there are actually only 4 tables), that those reads will be sub second. i'm guessing without getting into the details that what i'll have are PKs from that store. i'm guessing that i (or sql) needs to flush that store when i'm done and sql will ensure that nothing got dropped between my read and restart of tracking. my biggest challenge would be knowing which version of a specific pk is the one i want. when timestamps are recorded, they presumably are from when the record was built, not committed. that could be a problem no less worrisome than some of the issues brought up about row versions.

Answer

erland, change tracking is interesting. the most obvious question to me has to do with inserts and deletes. does it track those as well?

Answer

thx Tom i'll read your link next.

erland, i ran that query asking if read committed snapshot is on and got a zero. i'll check later if that is the rcsi feature (14 bytes per record , virtually no deadlocks, etc etc) or the other snapshot feature that i think is more of a query by query thing without the 14 byte overhead per record.

RCSI = Read Committed Snapshot Isolation.

Both RCSI and true snapshot isolation comes with the 14 bytes overhead per record.

i'll read up on change tracking first. i thought you were talking about cdc.

Answer

min_active_rowversion() seems like it would be useful in the same query to tell me if a lower row version than what is currently max is in flux, ie not committed. if not , it would be greater than the current max.

is sql server smart enough to use indexed row versions efficiently in multi join queries?

10 answers