SQL Server What CHECKSUM does

T.Zacks 3,996 Reputation points
2022-01-03T19:42:00.257+00:00

I issued below sql and saw collection of number return by select query instead of hashing

SELECT CHECKSUM(N'Sample String to hash')

so CHECKSUM and HASHBYTES both are different ?

when to use HASHBYTES and when to use CHECKSUM ?

please share the knowledge. thanks

Developer technologies | Transact-SQL
0 comments No comments
{count} votes

Accepted answer
  1. Erland Sommarskog 121.8K Reputation points MVP Volunteer Moderator
    2022-01-03T22:14:12.987+00:00

    CHECKSUM computes a 32-bit integer value by quite simple-minded XOR-algorithm. hashbytes uses a more sophisticated algorithm, and the value is longer.

    If you want a unique hash value to use for comparison rather than comparing the base values, you cannot use checksum, because the risk for collisions is too big.. This is both due to the simplistic algorithm and the short length of the value. Algorithm aside, if you draw 100000 random 32-bit numbers, the probability for at least one collision is 0.3. And if you draw a million numbers, the probability that all numbers are unique is in the range 1E-50.

    I had a case once where I used checksum with some success. There was a table with a number of attributes. There were reason to assume that the same set of value would reappear many times. So I added a checksum column, which I used as one of the keys. If two sets of values would have the same checksum, I had code to handle the collision. But I don't think this is the common case where people ask about checksum.

    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. Tom Phillips 17,771 Reputation points
    2022-01-03T19:57:42.147+00:00
    1 person found this answer helpful.
    0 comments No comments

  2. LiHong-MSFT 10,056 Reputation points
    2022-01-04T05:52:22.79+00:00

    Hi,@T.Zacks
    Please check the Docs of CHECKSUM & HASHBYTES
    It is recommended that if you pay more attention to speed if relative accuracy, use CHESKSUM instead of BINARY_CHECKSUM. If you pay more attention to accuracy in relative speed, use HASHBYTES.

    And my personal opinion: Checksum can only guarantee that the same data will get the same check code, but it cannot guarantee that different data will get different check codes, that is, there is a possibility of duplication. You should be careful when you compare checksum values.

    Best regards,
    LiHong


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

    Note: Please follow the steps in our Documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.