Azure Synapse Pipeline: Create resuable Quality Assurance modules

Question

Hi,

There is a requirement from the project team to design a Synapse based solution which has reusable load assurance and quality assurance controls added to it.

The idea is that for any source entity to be ETLed to target, it could use these built QA code blocks/modules in the pipeline.

A catch here is that for different source entities a handful of these QA rules can be embedded or for others none or all.

Appreciate guidance here.

Answer

Hi @Sumit Bhatnagar ,

Thank you for posting query in Microsoft Q&A Platform.

Sorry your ask is not clear. Could you please elaborate more with a sample example scenario about what you are trying to achieve?

You mean to say implement some pipelines in Synapse what will do Quality validations in target? If yes, there is something called Assert transformation which can be used for data quality checks. You can refer below video to understand better.
https://www.youtube.com/watch?v=_NzWpTRxt0s

I see that you mentioned as QA code blocks/Modules that means you are looking for developing some python modules and deploying it to spark pools in Synapse? If yes, then please refer below documentation which explains how to achieve this.
https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-manage-python-packages

Hope this helps. Please let us know how it goes. Thank you.

Share via

Azure Synapse Pipeline: Create resuable Quality Assurance modules

1 answer