How to generate a unique 20 digit identifier in T-SQL

Anonymous 61 Reputation points
2020-12-11T16:57:10.29+00:00

Hi,

I am using the following T-SQL code to generate unique 20 digit number. This Unique 20 digit number will be used as import id when importing records.

SELECT REPLACE(STR(CAST(CAST(NEWID() AS binary(5)) AS bigint),16), ' ', 0) + 'UKWS'

Is there any way I can test that this will always generate a unique import id?

Let's say that I will import 10 million records in next 5 or 10 years. Is there any chance that above T-SQL will generate duplicate import ids?

Thanks,

Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,555 questions
0 comments No comments
{count} votes

Accepted answer
  1. Erland Sommarskog 101.4K Reputation points MVP
    2020-12-11T22:53:00.067+00:00

    You say 20 digits, but it appears that you want 16 digits plus UKWS at the end. And what you generate is a 5-byte random number, so you will only get 12 full digits. The first three will always be a 0, and the fourth will most also be 0.

    Here is a better expression that gives you full 16 random digits:

     SELECT CAST(  CAST(CRYPT_GEN_RANDOM(8) AS bigint) % CAST(1E16 AS bigint) AS CHAR(16)) + 'UKWS'
    

    You need to do some math to find out what the likelihood for duplicates are with 10 million rows. My gut feeling is that it may be OK, but maybe not. Viorel's test showed that 12 digits was not good enough.

    If you go by a sequence as suggested by Tom, it becomes a non-issue. The data type for the sequence in your case should probably be decimal(16,0).

    1 person found this answer helpful.
    0 comments No comments

4 additional answers

Sort by: Most helpful
  1. Viorel 112.5K Reputation points
    2020-12-11T20:22:59.087+00:00

    It is not clear if you are interested in generating unique values or in validation of your method.

    If you did not check it yet, I made a test that inserts 10,000,000 values to a table according to shown approach. The test found 51 collisions; I hope that it was accurate. Therefore the method does not seem fully reliable.

    1 person found this answer helpful.

  2. Tom Phillips 17,716 Reputation points
    2020-12-11T18:35:25.83+00:00
    0 comments No comments

  3. David Browne - msft 3,766 Reputation points
    2020-12-11T23:05:17.783+00:00

    For generated random 64bit (8 byte) values, the chance of a collision within the first 10,000,000 values is less than 1/100,000.

    See eg https://preshing.com/20110504/hash-collision-probabilities for an overview of the math and this handy table:

    47552-image.png

    This is well below the threshold you really need to worry about, as an occasional collision would generate an isolated Primary Key violation failure.


  4. Anonymous 61 Reputation points
    2020-12-12T13:00:26.15+00:00

    Hi, thank you all for your answers. I think I am one step closer in resolving this. I can use CRYPT_GEN_RANDOM(9) to generate unique 20 digit for the import id. It doesn't need 'UKWS' at the end but it must be unique for a given transaction id. The import id column is a varchar column of 20 character length. It's already predefined so I cannot alter the datatype or the length.

    Now I need to populate the ImportID. But where the transaction id is the same it needs to have the same import id. So for example, where transactionid is 4001 it needs to have the same import id for all rows whrere the transactionid is 4001. Please adivse how I can achieve this using T-SQL query. Thanks.

    47641-image.png