Data Flow with ADLS source missed some partitions
We're using Data Factory to synchronise data from Synapse Link Data Lake with Azure SQL database. When the process runs, a mapping Data Flow is called for each table. This filters rows based on the maximum SinkModifiedOn value from the previous run.
Database users informed us that rows on one of the tables were out of date - we traced this back to a particular run of the Data Flow. I ran this in debug mode to try and simulate what it did:
In debug mode, the data preview returned 172 rows. When it ran for real, it picked up only 139 rows. Drilling down into the run statistics I can see that yearly partitions 2017 to 2023 account for these 139 rows. So it appears that partitions 2009 to 2016 have not been picked up although there are changes in these partitions. No errors were reported by the Data Flow or the ADF Pipeline that triggers it.
Why would this be? I have not seen this issue on any other Data Flow runs for this table.
Thanks for the additional details. Sorry, It's hard to pinpoint the root cause of the issue here. Support engineers can troubleshoot the issue by further looking into the logs from the backend and the pipeline details. I hope they can find the root cause.
Sign in to comment