Can Mapping Data Flow make the cut as mature ETL tool?

Anonymous
2022-03-28T22:05:50.087+00:00

Dear forum,

ADF can perform light ETL/ELT jobs where SSIS is used to support complex requirements.

ADF with Data Flow is an alternative to SSIS (but Data Flow is not there yet to replace SSIS?).

Do you suggest that mapping data flow is a prime time ETL tool that can perform any transformation (small, complex) to create data assets on the Data Lake or DWH and what is the right approach where it requires customizations? How the customization will be supported with new versions, etc?

Do you agree that one is better off go with PySpark based transformations by default to be on the safe side and look into mapping data flow for small tasks (data curation, data loading into a data lake, etc) in the data pipelines?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,661 questions
0 comments No comments
{count} votes

Accepted answer
  1. AnnuKumari-MSFT 34,561 Reputation points Microsoft Employee Moderator
    2022-03-30T13:40:17.653+00:00

    Hi @Anonymous ,

    Thankyou for using Microsoft Q&A platform and thanks for posting your query.

    As I understand your question, It looks like you want to get the clarification whether or not Mapping data flow can be used instead of SSIS and does it hold the capabilities to perform all sort of transformations which SSIS is able to do. Please let me know if my understanding of your query is incorrect.

    First of all, To give a brief understanding Data flows allows to develop data transformation logic without writing code. You can call your Data flow within Azure Data Factory pipelines that use scaled-out Apache Spark clusters.

    Kindly refer to the following article and check each transformations present in data flow: Mapping data flow transformation

    To answer your query, with ADF dataflows you can do transformations which you can do with SSIS. There are still some missing bits and pieces, but most of the commonly used functionality is there. It allows more than 90 connectors for different data sources.

    SSIS has in built Dotnet SDK which allows us to create Script tasks using .net code while in ADF, you can do customization using Azure functions, or you can use custom activity within your pipeline to call SDK codes.

    For the more advanced needs or complex transformations, you can also run Databricks activities within ADF pipelines with Java (Scala) or Python, integrate with Hadoop (Hive and Pig) and Spark.

    In ADF you can integrate your pipeline with Azure Devops and uses GIT for versioning .

    Moreover, ADF incorporates monitoring and diagnostic tools which in SSIS you had to build yourself. You can see much more easily which activity failed with what errors and create an automated pipeline to send an email notification using Azure logic apps.

    If you want an integrated platform for ADF pipelines, SQL technologies, Big data technologies for writing spark codes, and reporting using PowerBI then I would suggest to go for Azure Synapse analytics brings together the best of SQL technologies used in enterprise data warehousing, Spark technologies used for big data, Data Explorer for log and time series analytics, Pipelines for data integration and ETL/ELT, and deep integration with other Azure services such as Power BI, CosmosDB, and AzureML.

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you.
      Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.