how to get unique hash values for a column in dataflow mapping

Gokulavasan S 0 Reputation points
2023-12-14T09:37:57.9966667+00:00

hash

Hey I want to hash a id column which is in string format , I used crc32 function in dataflow mapping but I am getting same hash value for 2 different inputs, what else can I use to get a unique hash value for every unique id.
Thanks.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,623 questions
{count} votes

1 answer

Sort by: Most helpful
  1. phemanth 15,755 Reputation points Microsoft External Staff Moderator
    2023-12-14T11:57:21.3466667+00:00

    @Gokulavasan S

    Thanks for recahing out MS Q&A

    Use a different hashing algorithm: The crc32 function may not be suitable for generating unique hash values. You can try using a different hashing algorithm such as SHA-256 or MD5. These algorithms are designed to generate unique hash values for different inputs.

    • UUID3 and UUID5 in Python: These functions use the MD5 hash value of namespaces mentioned with a string to generate a random ID of that particular string.
    • Polynomial Rolling Hash Function: This is a widely used method to define the hash of a string. It is defined as hash(s) = s[0] + s[1] * p + s[2] * p^2 + ... + s[n-1] * p^(n-1) mod m, where p and m are some chosen, positive numbers.
    •  SHA256 Hash: You can convert the input string to a byte array and compute the hash.
    • String HashCode: This method involves using the hashCode function on the string, its reverse, and its halves. However, it’s important to note that even ideal hash functions have a chance of collision.

         

     Add a salt value: A salt value is a random string that is added to the input before hashing. This can help generate unique hash values even for the same input. You can try adding a salt value to the ID column before hashing. Here's an example of how you can add a salt value using the SHA-256 algorithm in Azure Data Factory:

    Use a combination of hashing and encryption: You can try using a combination of hashing and encryption to generate unique hash values. For example, you can hash the ID column using SHA-256 and then encrypt the hash value using AES-256.

    Check for duplicates: You can try checking for duplicates in the ID column before hashing. If there are duplicates, you can add a unique identifier to each duplicate value before hashing.

    Hope this helps. Do let us know if you any further queries.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.