Hi guys,
I'm modeling a fact table that has a document number as a degenerate dimension. This number on the source table is a varchar(23) but it's a garanteed string of numbers from 0 to 9 on each character. Since this can be used logically as a primary key of the fact table, I went to check if the sql server int types cover this, but they do not. The biggest int, the BIGINT only covers 19 digits long.
I then noticed I could use a Decimal(23,0). This would help save some space as the varchar(23) has aproximately 25 (each char * 1 byte) + 2 bytes ( the lenght info).
The decimal(23,0) stores data with precision 20-28 as 13 bytes which is the half of it. Customer also told me this document number is changing to 15 chars, but the historical numbers will remain as 23. This means that even with 15 characters, the same decimal(23,0), since it has a fixed precision, will keep it as 13 bytes vs 17 bytes. It's still a gain of 4 bytes, on a 150 million table, which will save >550MB from day 1, and counting.
In the fact table, these records will be updated (existing records) / inserted (new) via SQL Merge, on this PK, so I'm thinking it's worth to change it to a decimal (23,0) instead of the varhcar (23,0), although I would like to hear some opinions and brainstorm a little.
Thank you
Credit to where it's due: https://www.sqlshack.com/understanding-sql-decimal-data-type/