I would do this
BEGIN TRANSACTION
CREATE TABLE dbo.Tbl
(
ID bigint NOT NULL IDENTITY(1, 1)
,DT datetime2(3) NOT NULL
,CONSTRAINT PK_ID PRIMARY KEY CLUSTERED (ID)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100, OPTIMIZE_FOR_SEQUENTIAL_KEY = ON) ON [PRIMARY]
)
CREATE NONCLUSTERED INDEX IX_Tbl_DT ON dbo.Tbl (DT ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100)
ROLLBACK
Use SQL 2019, see
I prefer sequence object to identity
Thanks
Also see https://learn.microsoft.com/en-US/troubleshoot/sql/performance/resolve-pagelatch-ex-contention
Because this table, my sample, is trivial.
Normally I have a lot more columns and I do need to query my tables, and I need quite a bit more indexes, so if PK is smaller, then other indexes are smaller too as clustered key is a part of all indexes.
Your situation may be different. If INSERT is your priority, and that is the only thing you care, then having PK on DT, ID will work slightly faster actually
Results were
DIFF_MS, PK = ID (100,000 rows INSERT)
270
vs
DIFF_MS, PK = DT, ID, (100,000 rows INSERT)
196
I ran a test with 1,000, 000 rows
DIFF_MS, PK = ID
2817
DIFF_MS, PK = DT, ID,
2552
therefore, the difference is not that large, but index is built as well
Inserts are the top priority. The selects we're doing as ad-hoc and only few procedures are actually selecting from that table. We do use the ad-hoc selects when we need to research an issue as every process adds some info into that table.
I guess the bottom line is that I'll use dt/id index.
@Naomi
Other thoughts for you, for having ID PK only.
PK should be minimalistic, and all other attributes/columns must depend on the key and only on the key. If ID is the PK, DT is attribute that depends only and only on PK. Any other columns you have should have the same dependency.
Moreover, if you need to change time resolution (don't rule this out) and go from datetime2(3) to datetime2(7), then you can do this and don't have to change your key.
And please use datetime2 (vs datetime).
PK may need to be migrated to any other related tables, if you have such, hence minimalistic requirement. For example, say you have LogHeader and LogDetails entities. Then just ID PK is better suited for this.
Also, according to my test results for pure insert, the difference is very minor, and we are splitting the hairs in this case, trying to rationalize one versus the other purely on the performance merit.
100K records a day is not high volume, let's not kid ourselves, however you only see and concentrate on this aspect only.
Therefore, any minor gains you may gain in pure INSERT speed are not worth design compromises. Changes to table structures are hard later, so make sure you make the best technical decision to start with, that is forward looking to future expansion, and based on solid data modeling.
Have a wonderful day
Thanks
Vladimir
Hi Vladimir,
This particular table is standalone, there are no dependencies on it. It is used purely for auditing and it has only a few columns (process, subprocess, note, type, datetime, id, sp_id, entry type). Each process we're doing is making entries into this table, so it grows and grows. 100K per day was my rough estimation, it may be more than that.
I proposed to archive current data and start fresh, that's why we're thinking of changing the PK index (which is currently on ID column). Do you think clustered PK on Id + separate index on date is a better option? E.g. we can have clustered PK on Id + index on date desc, ID desc.
I need to make a decision ASAP as I already have a script and most likely we would have a deployment this Thursday (tomorrow).
IMHO, yes, it is better, for the above reasons.
I would implement ID as PK and index(es) on other columns
You may also consider building an index on computed column that is 'CAST(DT as date) ' so you can search for given date range, if that is what you do, instead of index on DT(datetiem2)
it will be smaller and may provide all you need. I don't know your exact search requiremnts.
This is what I would go for.
datetime2 with a higher scale than 3 only make sense if you are storing timestamps from an external source. But you are using sysdatetime() which only gives you ms precision. Check this out:
Disclaimer: this is on Windows. It may be different on Linux.
I cannot make up my mind yet. I did change the script to use datetime2(2), but in regards to indexes I just keep PRIMARY KEY CLUSTERED ([admin_audit_dttm], [admin_audit_id]). We did discuss it yesterday and I just don't want to open up a new discussion again. I don't know if we're migrating tomorrow or not.
Sign in to comment
1 additional answer
Sort by: Most helpful
If you need to know the order the rows were inserted in, it is better to rely on the id column I think. This is because time can go backwards in modern computers. That is, a row B inserted after row A, may still have an earlier timestamp, because CPUs may not be in perfect sync.
I saw rows with the same time, that's why I think time, Id may help. We're using plain datetime and GETDATE() for default, so we may expect collisions. I think for our purposes it's not crucial if the rows will be in wrong order of insertion, we don't select from this table except for ad-hoc queries.
Yes, the id is a must, since collision can happen even without clocks going backwards.
A better choice of data type is datetime2(3) and the function sysdatetime(). Then you get 1 ms in resolution (at least in theory). datetime/getdate() only give you 3.33 ms.
Sign in to comment
Activity