Hi we will be on 2019 in a few months. i dont know whether it will be enterprise or std.
i'm a big fan of row versions (cast as bigint) as a better choice than datetimes to be indexed so we can quickly identify in incremental extracts only those records that changed since last time we ran an an extract. pls see the attached code for 2 sample tables and the control table we use to keep track of the last high key encountered on each table. you can see the first 2 records being inserted to the control table. basically the most current records in this table are "ended" on each extract and 2 new records with new high keys are inserrted.
i'm looking at a relatively deep join right now and have to wonder how smart/efficient sql will be in extracting only the latest changes. in the example i show 2 tables but in reality its n tables that basically chain to each other thru various combos of pk's and pointers and we can assume the pointers are fks too. and for simplicity i'd like to assume they each table has a row version cast as bigint indexed.
the predicate in the join might look something like
where (research1.row_version_number > x and reserach1.row_version_number <= y) or
(research2.row_version_number > x1 and research2.row_version_number <=y2) or...
and the join to the 2nd table might look something like this...
select ...from research1 r1
join research2 r2
on r1.apointer=r2.pk...
so i'm wondering if sql will throw its hands in the air and apply the worst strategy in joining the n tables or try to apply some cool algorithm that even understands the nth table changed but the 1st didnt , yet the 1st table's "related" record needs to be extracted anyway. if the answer is yes, is there a term for what it will do in its plan?
USE [research]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[research1](
[Id] [int] IDENTITY(1,1) NOT NULL,
pk uniqueidentifier,
aPointer uniqueidentifier,
[aFact] [int] NULL,
row_version_stamp rowversion,
row_version_number as cast(ROW_VERSION_STAMP as bigint),
CONSTRAINT [PK_research1] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[research2](
[Id] [int] IDENTITY(1,1) NOT NULL,
pk uniqueidentifier,
aPointer uniqueidentifier,
[aFact] [int] NULL,
row_version_stamp rowversion,
row_version_number as cast(ROW_VERSION_STAMP as bigint),
CONSTRAINT [PK_research2] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE unique NONCLUSTERED INDEX [IX_research1_rowversionnumber] ON [dbo].[research1]
(
ROW_VERSION_NUMBER
)
WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [primary]
GO
CREATE unique NONCLUSTERED INDEX [IX_research2_rowversionnumber] ON [dbo].[research2]
(
ROW_VERSION_NUMBER
)
WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [primary]
GO
USE [research]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[JobControl](
[JobControlId] [int] IDENTITY(1,1) NOT NULL,
[PackageName] [varchar](100) NULL,
[ServerName] [varchar](16) NULL,
[InstanceName] [varchar](16) NULL,
[DatabaseName] [varchar](128) NULL,
[SchemaName] [varchar](128) NULL,
[TableName] [varchar](128) NULL,
[HighKey] [varchar](100) NULL,
[StartDate] [datetime2](7) NULL,
[EndDate] [datetime2](7) NULL,
PRIMARY KEY CLUSTERED
(
[JobControlId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]
GO
declare @dt datetime2(7) = getdate()
insert jobcontrol
(
[PackageName] ,
[ServerName] ,
[InstanceName] ,
[DatabaseName] ,
[SchemaName] ,
[TableName] ,
[HighKey] ,
[StartDate] ,
[EndDate]
)
select 'pkg1','server1',null,'research','dbo','research1','0',@dt,null
union
select 'pkg1','server1',null,'research','dbo','research2','0',@dt,null