Where to use Mapping Data Flow make sense (Design Question)?

CK71 41 Reputation points
2022-03-28T13:04:45.263+00:00

My assumptions where MDF might be right fit are as follows:

1) MDF can be used as a Data Wrangling Tool by end-users

2) MDF is better suited for SQL Server-based Datawarehouse architectures to load the data into staging or data lake in clean format (prepare the data before loading it to SQL Server DWH and then use a proper ETL tool to do transformations)

3)If MDF has to be used for light ELT / ETL tasks directly on Data Lake or DWH, it needs customization for complex transformations using SSIS packages, custom python, Stored Procedures.

My question would be:

A) Did anyone use Mapping Data Flow in production for option 2 and 3 above?

B) If assumption 3 is valid, would you suggest going for Spark-based transformation or an ETL tool rather than patching the MDF with customizations as new versions might not be compatible with, etc..

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,597 questions
0 comments No comments
{count} votes

Accepted answer
  1. HimanshuSinha-msft 19,476 Reputation points Microsoft Employee
    2022-03-28T22:29:46.123+00:00

    Hello @CK71 ,
    Thanks for the question and using MS Q&A platform.
    As we understand the ask here is clarify on the design question , please do let us know if its not accurate.

    A) Did anyone use Mapping Data Flow in production for option 2 and 3 above?

    Yes , we do have a very high end customers who are using the MDF in production . Also MDF enables you to implement different kind of transformation without writing code . I could politely disagree to the statement that is more suited to SQL server datawarehouse as we see users implement JSON/XML transformation extensively .

    B) If assumption 3 is valid, would you suggest going for Spark-based transformation or an ETL tool rather than patching the MDF with customizations as new versions might not be compatible with, etc..

    A lot of the spark function are implemented in MDF , but then if you have any implementation which does falling sort , you can always use Spark based transformation . WIth the increase in popularity of Spark and so many libraries , i would vote for Spark if MDF is not helping . Please beware that MDF/Spark does take the the computational load of the transformation from the source or sink server , but if you use SSIS or other ETL product that may or may not be the case .

    Please do let me if you have any queries.
    Thanks
    Himanshu


    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
      • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.