Where to use Mapping Data Flow make sense (Design Question)?

Question

Where to use Mapping Data Flow make sense (Design Question)?

Anonymous

My assumptions where MDF might be right fit are as follows:

1) MDF can be used as a Data Wrangling Tool by end-users

2) MDF is better suited for SQL Server-based Datawarehouse architectures to load the data into staging or data lake in clean format (prepare the data before loading it to SQL Server DWH and then use a proper ETL tool to do transformations)

3)If MDF has to be used for light ELT / ETL tasks directly on Data Lake or DWH, it needs customization for complex transformations using SSIS packages, custom python, Stored Procedures.

My question would be:

A) Did anyone use Mapping Data Flow in production for option 2 and 3 above?

B) If assumption 3 is valid, would you suggest going for Spark-based transformation or an ETL tool rather than patching the MDF with customizations as new versions might not be compatible with, etc..

Accepted answer

0 additional answers

Your answer

Answer 1

Hello @Anonymous ,
Thanks for the question and using MS Q&A platform.
As we understand the ask here is clarify on the design question , please do let us know if its not accurate.

A) Did anyone use Mapping Data Flow in production for option 2 and 3 above?

Yes , we do have a very high end customers who are using the MDF in production . Also MDF enables you to implement different kind of transformation without writing code . I could politely disagree to the statement that is more suited to SQL server datawarehouse as we see users implement JSON/XML transformation extensively .

B) If assumption 3 is valid, would you suggest going for Spark-based transformation or an ETL tool rather than patching the MDF with customizations as new versions might not be compatible with, etc..

A lot of the spark function are implemented in MDF , but then if you have any implementation which does falling sort , you can always use Spark based transformation . WIth the increase in popularity of Spark and so many libraries , i would vote for Spark if MDF is not helping . Please beware that MDF/Spark does take the the computational load of the transformation from the source or sink server , but if you use SSIS or other ETL product that may or may not be the case .

Please do let me if you have any queries.
Thanks
Himanshu

Please don't forget to click on or upvote button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
Want a reminder to come back and check responses? Here is how to subscribe to a notification
- If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

Share via

Where to use Mapping Data Flow make sense (Design Question)?

0 additional answers

Your answer