How to leverage CheckMD5 to ensure today's file looks different from yesterday's file

Moore, Payton E 101 Reputation points
2021-06-10T20:52:01.503+00:00

I am working in Azure Synapse Analytics and am wanting to ensure today's file is different from yesterday's file in my pipeline. My naming convention for files is filename1YYYYMMDD.csv. I want to leverage MD5 to check to ensure each file is not the same, but having challenges around checking between directories and dynamically checking yesterday's file. Today's file will be in the 'Incoming' directory whereas yesterday's file will be in the 'Archive' directory which breaks down into various subdirectories based on Year, Month, and Day of data processing. My pipeline involves a 'Get metadata' activity on the 'Incoming' directory and a 'Getmetadata' activity on the 'Archive' directory. This is followed by an if activity to compare the two. Another concern I have is I want to compare filename1 with filename1 and filename22 with filename22 (excludes the YYYYMMDD appended to the file names). Any suggestions on my approach?

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,355 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,422 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,643 questions
{count} votes

1 answer

Sort by: Most helpful
  1. HimanshuSinha-msft 19,381 Reputation points Microsoft Employee
    2021-06-17T22:20:41.72+00:00

    Hello @@Moore, Payton E ,,

    Thanks for the ask and using the Microsoft Q&A platform .
    md5 function is not supported in pipeline , but its is supported by mapping data flow . Since you are having the .csv file you can implement that on the column level . Please read about the same here

    On the second ask , there can be many ways you can do this , but I like the below dynamic expression . The logic is pretty starigh forward . Read the toal length of the filename and subtract 12 ("YYYYMMDD.csv") from that .

    @substring(pipeline().parameters.parameter1,0,sub(length(pipeline().parameters.parameter1),12))

    Note: while testing I passed the filename as a parameter .

    Please do let me know how it goes .
    Thanks
    Himanshu
    Please do consider clicking on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members