Mapping Dataflow vs SQL Stored Procedure in ADF pipeline

Alok Thampi 151 Reputation points
2020-09-15T15:36:58.58+00:00

Hello,

I have a requirement where I need to choose between Mapping Data Flow vs SQL Stored Procedures in an ADF pipeline to implement some business scenarios. The data volume is not too huge now but might get larger at a later stage.
The business logic are at times complex where I will have to join multiple tables, write sub queries, use windows functions, nested case statements, etc.

All of my business requirements could be easily implemented through a SP but there is a slight inclination towards mapping data flow considering that it runs spark underneath and can scale up as required.

Does ADF Mapping data flow has an upper hand over SQL Stored Procedures when used in an ADF pipeline?
Some of the concerns that I have with the mapping data flow are as below.
• Time taken to implement complex logic using data flows is much more than a stored procedure
• The execution time for a mapping data flow is much higher considering the time it takes to spin up the spark cluster.

Now, if I decide to use SQL SPs in the pipeline, what could be the disadvantages?
Would there be issues with the scalability if the data volume grows rapidly at some point in time?

Thanks.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,652 questions
0 comments No comments
{count} votes

Accepted answer
  1. HimanshuSinha-msft 19,491 Reputation points Microsoft Employee Moderator
    2020-09-15T23:38:03.237+00:00

    Hello @Alok Thampi ,

    Thanks for the putting ask so elaborately ( it does help ) and also welcome to the forum .

    [Question1 ]Does ADF Mapping data flow has an upper hand over SQL Stored Procedures when used in an ADF pipeline?
    Some of the concerns that I have with the mapping data flow are as below.
    • Time taken to implement complex logic using data flows is much more than a stored procedure
    • The execution time for a mapping data flow is much higher considering the time it takes to spin up the spark cluster.

    [Answer 1 ] I think since MDF is running on Azure Databricks it does have the advantage of scale . But then i must let you know that MDF is very new product and is not even 1 year old and is evolving fast . The authoring experience which it brings does not require much coding ( i believe only expression and queries ) . On the other hand if you are writing stored procedure and it does require a good TSQL knowledge to put the complex logic . As you have mentiopned you do not have a huge data , i think you can use ADF with SP and get the work down . Also I think you should defineitely consider the cost factor for ADF and MDF . Please do read more on that here .

    In case of executing complex transformation logic the compute load will be on the DB server on which the SP is running , but in case of MDF it will outside the server , i think this point is worth considering .

    [Question 2 ]Now, if I decide to use SQL SPs in the pipeline, what could be the disadvantages?
    Would there be issues with the scalability if the data volume grows rapidly at some point in time?

    [Answer 2 ] As mentioned above depending upon how much code you want to write , i think with SP you can implement all the logic . You never mentioned database server which you are using , if you are using the Azure SQL , you can use scale up and down version and upgrade/downgrade within the tiers .

    Thanks
    Himanshu
    Please do consider to click on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.