Synapse data flow assert timeout

Pierre-Andre van Leeuwen 56 Reputation points
2022-02-07T14:21:24.383+00:00

We use the Assert transformation to validate CSV files we import in to Synapse, but it causes debug previews of the data to slow down and even time out. Running the data flow from a pipeline can take up to 30 minutes on a single input file with 2 records.

I first used a single Assert transformation with one condition entry for each column. There are 20 columns in total. 18 of the columns only check that the column contains data (Expect true). The other two were "Expect exists" and used two other Source transformations to look up data. The lookup data contained 6000 rows.

The above timed out every time I tried to preview data. I then split up the Assert transformation in to 4 steps. This is a little better, but the 2 lookup asserts normally time out. Running the pipeline still takes 30 minutes.

Removing the two "Expect exists" steps does improve the situation, but there are still timeouts and the pipeline runs for 9 minutes.

Specific error messages we're getting:

  • Failed to fetch data preview due to operation timeout.
  • Could not fetch statistics due to operation timeout.

Are we doing something wrong, or is the Assert transformation not meant to validate this many columns?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,335 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. ShaikMaheer-MSFT 37,816 Reputation points Microsoft Employee
    2022-02-09T14:40:05.68+00:00

    Hi @Pierre-Andre van Leeuwen ,

    Thank you for posting query in Microsoft Q&A Platform.

    Assert transformation should work fine with out delays similar to other transformations. I tried assert at my end I don't see this behavior.

    Could you please confirm how other transformations in data flow behaves?

    Also, kindly consider trying by increasing vcores inside data flow transformation if that helps.

    Click here to know about performance tunning recommendations for data flows.

    Please let us know how it goes.

    172568-image.png