Azure Synapse Pipeline: Create resuable Quality Assurance modules

Sumit Bhatnagar 1 Reputation point
2022-04-19T12:54:00.79+00:00

Hi,

There is a requirement from the project team to design a Synapse based solution which has reusable load assurance and quality assurance controls added to it.

The idea is that for any source entity to be ETLed to target, it could use these built QA code blocks/modules in the pipeline.

A catch here is that for different source entities a handful of these QA rules can be embedded or for others none or all.

Appreciate guidance here.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,697 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. ShaikMaheer-MSFT 38,401 Reputation points Microsoft Employee
    2022-04-21T10:00:40.58+00:00

    Hi @Sumit Bhatnagar ,

    Thank you for posting query in Microsoft Q&A Platform.

    Sorry your ask is not clear. Could you please elaborate more with a sample example scenario about what you are trying to achieve?

    You mean to say implement some pipelines in Synapse what will do Quality validations in target? If yes, there is something called Assert transformation which can be used for data quality checks. You can refer below video to understand better.
    https://www.youtube.com/watch?v=_NzWpTRxt0s

    I see that you mentioned as QA code blocks/Modules that means you are looking for developing some python modules and deploying it to spark pools in Synapse? If yes, then please refer below documentation which explains how to achieve this.
    https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-manage-python-packages

    Hope this helps. Please let us know how it goes. Thank you.