question

ahmedsalah-1628 avatar image
0 Votes"
ahmedsalah-1628 asked TomPhillips-1744 commented

can i store hashbyte on field with length smaller than fields i concatenate it?

I work on sql server 2019 i have table have multi column string concatenation as StrSubstance,strmass ,strcasnumber

i need to create filed hashchemical with nvarchar(700) and this will store hash for 3 columns concatenation with each other

are this possible

what i mean

 alter table [dbo].[fmdchemical] add hashchemical nvarchar(700) null
    
 update ch  set ch.hashchemical =HASHBYTES('SHA2_512',concate(StrSubstance,strmass,strcasnumber)) from [dbo].[fmdchemical] ch


so are hashing will be correct with nvarchar(700) and every field have length 3500

are this will make issue on the feature with big counts

also are hash every column alone then compare it or concate all these columns on one column then compare





 CREATE TABLE [dbo].[fmdchemical](
      [id] [int] IDENTITY(1,1) NOT NULL,
      [chemicalid] [int] NULL,
      [StrSubstance] [nvarchar](3500) NULL,
      [strmass] [nvarchar](3500) NULL,
      [strcasnumber] [nvarchar](3500) NULL
  ) ON [PRIMARY]
        
  GO
  SET IDENTITY_INSERT [dbo].[fmdchemical] ON 
        
  INSERT [dbo].[fmdchemical] ([id], [chemicalid], [StrSubstance], [strmass], [strcasnumber]) VALUES (826807, 748787, N'3P04,AL2O3,C,C,Cr203,Cu,Cu,Fe,Mn,Others,others,others,Others,Pb,Si02,SiC,Silicone Resin,SiO2,Sn,Sn,Ti02', N'0.0004,0.0096,0.336,0.3992,0.5904,0.764,0.7904,1.5408,1.6176,1.6808,1.6808,2.9344,5.2392,7.2,7.9608,19.2072,32.08,65.3584,148.3384,232.7992,269.472', N'592910,592960,592961,593043,593043,593259,593290,593306,593306,593495,593501,593501,593505,593505,621442,621442,621442,621442,622089,624020,650559')

sql-server-generalsql-server-transact-sql
· 5
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

i need to compare string exist on 3 column StrSubstance,strmass,strcasnumber

with same fields on another table to get chemicalid

so are hashing is best way for that

also are varbinary 32 is best from varchar(200)

0 Votes 0 ·
NaomiNNN avatar image NaomiNNN ahmedsalah-1628 ·

Since your columns are so wide, I think hashing may help. If you store hash('sha2_256', column1+'|'+column2 +'|' + column3) in both tables, you would be able to use that value for comparison. I believe varbinary(32) is all you need.

0 Votes 0 ·

only small quetions
are using sha2_256 is best from SHA2_512
also are hashing will increase more size or it have fixed size

0 Votes 0 ·
Show more comments
NaomiNNN avatar image
0 Votes"
NaomiNNN answered NaomiNNN edited

You can use varbinary(32) to store the hash value in a table of your concatenated columns if you're using SHA2_256 algorithm. What exactly is the purpose and what do you want to compare?

Check https://docs.microsoft.com/en-us/sql/t-sql/functions/hashbytes-transact-sql?view=sql-server-ver15

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

TomPhillips-1744 avatar image
0 Votes"
TomPhillips-1744 answered TomPhillips-1744 commented

The output of HASHBYTES is a varbinary of a size based on the algorithm used. The output size has no relationship to the input size.

See:
https://docs.microsoft.com/en-us/sql/t-sql/functions/hashbytes-transact-sql?view=sql-server-ver15

The output conforms to the algorithm standard: 128 bits (16 bytes) for MD2, MD4, and MD5; 160 bits (20 bytes) for SHA and SHA1; 256 bits (32 bytes) for SHA2_256, and 512 bits (64 bytes) for SHA2_512.

· 5
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

if i replace varbinary(64) with varchar(64)
are will have side affects
or both same

0 Votes 0 ·

if i replace varbinary(64) with varchar(64)

And what should that be good for storing a binary as string. Depending on the used collation you will get wrong results if you compare strings instead real binary values.
0 Votes 0 ·

HashBytes does not return a char string. This not human readable or usable. There is no reason to convert the varbinary to a varchar.

0 Votes 0 ·

I re-read documentation one more time and a few things are not clear.

  1. Are the varchar(max)/nvarchar(max) values allowed as input in SQL 2019?

  2. It said the return value is maximum 8000 bytes which is also not clear as I see that one is varbinary(32), the other is varbinary(64). Where this 8000 for return value comes from?

0 Votes 0 ·

I believe "max 8000" is reserved for future algorithms.

0 Votes 0 ·
BertZhoumsft-7490 avatar image
0 Votes"
BertZhoumsft-7490 answered BertZhoumsft-7490 edited

Hi,@ahmedsalah-1628

Welcome to Microsoft T-SQL Q&A Forum!

It's not very clear why you want to use Varchar , HashBytes can only accept up to 8000 characters , obviously varbinary(64) is better . If you want to create a hash of a string larger than the limit, you will have to resort to the CLR or chunk the string .
I have done this in T-SQL using :
1) Cut the string into 8000/4000 characters;
2) Hash each block;
3) Concatenate the hashes and test the length;
4) Hash join, repeat chop if limit is exceeded;
5) End with a hash at the end.

Best regards,
Bert Zhou


If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
Note: Please follow the steps in our Documentation to enable e-mail notifications if you want to receive the related email notification for this thread.



5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.