Mapping Dataflow vs SQL Stored Procedure in ADF pipeline

Question

Mapping Dataflow vs SQL Stored Procedure in ADF pipeline

Alok Thampi 151

Hello,

I have a requirement where I need to choose between Mapping Data Flow vs SQL Stored Procedures in an ADF pipeline to implement some business scenarios. The data volume is not too huge now but might get larger at a later stage.
The business logic are at times complex where I will have to join multiple tables, write sub queries, use windows functions, nested case statements, etc.

All of my business requirements could be easily implemented through a SP but there is a slight inclination towards mapping data flow considering that it runs spark underneath and can scale up as required.

Does ADF Mapping data flow has an upper hand over SQL Stored Procedures when used in an ADF pipeline?
Some of the concerns that I have with the mapping data flow are as below.
• Time taken to implement complex logic using data flows is much more than a stored procedure
• The execution time for a mapping data flow is much higher considering the time it takes to spin up the spark cluster.

Now, if I decide to use SQL SPs in the pipeline, what could be the disadvantages?
Would there be issues with the scalability if the data volume grows rapidly at some point in time?

Thanks.

Accepted answer

0 additional answers

Your answer

Answer 1

Hello @Alok Thampi ,

Thanks for the putting ask so elaborately ( it does help ) and also welcome to the forum .

[Question1 ]Does ADF Mapping data flow has an upper hand over SQL Stored Procedures when used in an ADF pipeline?
Some of the concerns that I have with the mapping data flow are as below.
• Time taken to implement complex logic using data flows is much more than a stored procedure
• The execution time for a mapping data flow is much higher considering the time it takes to spin up the spark cluster.

[Answer 1 ] I think since MDF is running on Azure Databricks it does have the advantage of scale . But then i must let you know that MDF is very new product and is not even 1 year old and is evolving fast . The authoring experience which it brings does not require much coding ( i believe only expression and queries ) . On the other hand if you are writing stored procedure and it does require a good TSQL knowledge to put the complex logic . As you have mentiopned you do not have a huge data , i think you can use ADF with SP and get the work down . Also I think you should defineitely consider the cost factor for ADF and MDF . Please do read more on that here .

In case of executing complex transformation logic the compute load will be on the DB server on which the SP is running , but in case of MDF it will outside the server , i think this point is worth considering .

[Question 2 ]Now, if I decide to use SQL SPs in the pipeline, what could be the disadvantages?
Would there be issues with the scalability if the data volume grows rapidly at some point in time?

[Answer 2 ] As mentioned above depending upon how much code you want to write , i think with SP you can implement all the logic . You never mentioned database server which you are using , if you are using the Azure SQL , you can use scale up and down version and upgrade/downgrade within the tiers .

Thanks
Himanshu
Please do consider to click on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members

Alok Thampi 151 Reputation points

2020-09-16T02:27:40.78+00:00

Appreciate the quick response Himanshu, that helps.!
The database options are Azure SQL and Azure synapse.

Also, do we have an option to capture the lineage while implementing the lineage if we use SQL SPs?
That was one another reason why MDF was a choice considering that it goes well with ADC (Azure Data Catalog)
HimanshuSinha-msft 19,491 Reputation points Microsoft Employee Moderator

2020-09-23T23:44:05.613+00:00

Hello ,
Apoloziges for the delay in response from my side .

To your point of capturing lineage there is no easy way to do that apart from the fact that you will have design the actvities and the underlying SP's to mainataing a lineage using the Stored proc activity .

To your last point I am curious as to why MDF works good for with ADC and not ADF ? I am pretty sure I am missing somethig here , can you please elaborate

Thanks
Himanshu
manish verma 516 Reputation points

2023-10-07T17:06:22.1066667+00:00

Thanks both of you, it is great discussion, just take a use case, if we use SQL Manage instance, and create an external table to reduce data volume growth inside database, then which option will good. we should think to go with one approach only. if Microsoft expert said we can achieve any transformation logic in current time and future time then agree MDF is good choice. But something we are not able to implement and again we go with SP then it will be a problem to maintain multiple approach for data load.

Also, if you can share with cost difference if we run a transformation in MDF or azure SQL Manage instance SQL SP which one low cost, then it will really appreciate.

Let’s take this use case with low volume of data around 300 to 500 GB.

Share via

Mapping Dataflow vs SQL Stored Procedure in ADF pipeline

0 additional answers

Your answer