Data Flow with ADLS source missed some partitions

Robinson, Andrew 0 Reputation points
2023-07-13T09:24:18.51+00:00

We're using Data Factory to synchronise data from Synapse Link Data Lake with Azure SQL database. When the process runs, a mapping Data Flow is called for each table. This filters rows based on the maximum SinkModifiedOn value from the previous run.

Database users informed us that rows on one of the tables were out of date - we traced this back to a particular run of the Data Flow. I ran this in debug mode to try and simulate what it did:

In debug mode, the data preview returned 172 rows. When it ran for real, it picked up only 139 rows. Drilling down into the run statistics I can see that yearly partitions 2017 to 2023 account for these 139 rows. So it appears that partitions 2009 to 2016 have not been picked up although there are changes in these partitions. No errors were reported by the Data Flow or the ADF Pipeline that triggers it.

Why would this be? I have not seen this issue on any other Data Flow runs for this table.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,199 questions
{count} votes