I want to create a dynamic Expression in the ADF Dataflow by concate all the column except one column and create the hash value

Jaganathan, NK. (Naveen Kumar) 0 Reputation points
2023-09-08T17:06:47.7766667+00:00

Hi All,

i want to create a Dynamic Dataflow expression that should perform trim and coalesce with blank like below for each column ......
coalesce(trim(<field>),'')

If the column type is non-string it should perform below...

coalesce(trim(toString(<field)),'')

And finally all columns should be concatenated and do md5 hashing... Everything has to dynamic as i cannot hardcode any of the column name as the dataflow should handle over 10k inputs with this logic....

Eg... If my Input1 has 3 columns

Field1 --> string

Field 2 --> Long

Field 3 --> Date

My Expression should be like....
md5(concat(coalesce(trim(Field1),''),coalesce(trim(toString(Field2)),''),coalesce(trim(toString(Field3)),'')))

Please suggest...

Thanks

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,631 questions
{count} votes

2 answers

Sort by: Most helpful
  1. AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator
    2023-09-11T17:13:06.76+00:00

    Hi Jaganathan, NK. (Naveen Kumar) ,

    Welcome to Microsoft Q&A platform and thanks for posting your query here.

    I understand that your requirement is to generate hash value for columns present in your dataset dynamically.

    For that, you can use columns() function in dataflow, it dynamically gets all the columns present in the dataset.

    Additionally, you mentioned that you want to exclude one column , instead of using column() function , you can also create a parameter in the dataflow and define the desired column names in that parameter and use that parameter in hash function such as (sha2, md5 etc)

    For implementation, kindly visit the below video:

    Find Row Uniqueness through Hashing with ADF and Synapse

    Hope it helps. Kindly accept the answer by clicking on Accept answer button. Thankyou


  2. Jaganathan, NK. (Naveen Kumar) 0 Reputation points
    2023-09-14T10:20:39.1266667+00:00

    Hi Annu,

    Defining the column names in the parameter will be difficult as i have around 10k tables and each table will have 50+ columns. I have tried the column function but there are two point to be handled in my requirement ,

    1 --> Exclude a column (which is fixed across all tables lets say the column as hash_key)

    2 --> As i mentioned above, based on the type we might need to some cleansing and then concat the values.

    For String Column --> coalesce(trim(bocd_name_1),'')

    For Integer/Long Column --> coalesce(trim(toString(bocd_check_dig)),'')

    For Date column --> coalesce(trim(iif(year(date_column) == 9999 && length(toString(date_column)) == 19,concat(toString(date_column),'.000000'),rpad(toString(date_column),26,'0'))),'')

    I am expecting any Expression that can handle this dynamically.

    Thanks,

    Naveen

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.