We use the Assert transformation to validate CSV files we import in to Synapse, but it causes debug previews of the data to slow down and even time out. Running the data flow from a pipeline can take up to 30 minutes on a single input file with 2 records.
I first used a single Assert transformation with one condition entry for each column. There are 20 columns in total. 18 of the columns only check that the column contains data (Expect true). The other two were "Expect exists" and used two other Source transformations to look up data. The lookup data contained 6000 rows.
The above timed out every time I tried to preview data. I then split up the Assert transformation in to 4 steps. This is a little better, but the 2 lookup asserts normally time out. Running the pipeline still takes 30 minutes.
Removing the two "Expect exists" steps does improve the situation, but there are still timeouts and the pipeline runs for 9 minutes.
Specific error messages we're getting:
- Failed to fetch data preview due to operation timeout.
- Could not fetch statistics due to operation timeout.
Are we doing something wrong, or is the Assert transformation not meant to validate this many columns?