byNames will return null if a column does not exist. nulls do count towards hash computations and are not ignored. It is not a concatenated string representation of all column values but more of a byte representation of those values.
SHA256 produces hashes for columns that don't exist
In both cases I am deriving a column like this, within a dataflow:
sha2(256, byNames($hashcols))
I define hashcols
as the parameter
hashcols = ['FirstName','Date of Birth','Ingest_TS']
The Derive Column step in my dataflow produces hashes, as expected. Whether those are the actual SHA256 hashes seems in question.
That's because, If I change that param to be
hashcols = ['FirstName','Date of Birth','Ingest_Date']
I still get hashed output, but different hashes - even though "Ingest_Date" doesn't exist as a column.
The same problem obtains if I hash the array directly:
sha2(256, byNames(['FirstName','Date of Birth','Ingest_Date']))
How can I have any faith that the actual columns are being used to generate the output hash, and not some string or array of strings? The hashes vary across different inputs, so it doesn't seem like the strings are being used as constants each time - but still the problem remains that a non-existent column is used and producing a hash in some way, and that output varies across inputs. How would I paste the 3 values referenced by $hashcols
into something like https://passwordsgenerator.net/sha256-hash-generator/ to verify the correctness of the output?
2 answers
Sort by: Most helpful
-
Kiran-MSFT 691 Reputation points Microsoft Employee
2021-06-17T04:14:08.967+00:00 -
MartinJaffer-MSFT 26,081 Reputation points
2021-06-23T00:07:03.873+00:00 The original problem was that it was trying to work with bad column names, right @Jeff vG ?
How about this expression to scrub out all column names which do not exist in a stream?
mapIf($badnames, in(columnNames('source1'),#item), #item)