Hi @Subin Pius ,
Thank you for using MS Q&A.
I think you can do the following:
option#1: you can have an updated_time for each record. When Consumer
process will pick up the data from the datalake, it will sort the record by updated_time for each recordId and only process the latest item/row for that recordId
option#2: If it's a full load each time, you can have a timespan in the filename e.g., 20210607 that is yyMMdd
or you can maintain a folder hierarchy to save the csv file. And then let the 'Consumer` process only pick up the latest file.
Hope this will help. Thanks!
--Nasreen