How to replace Data flow activity having transformations inside it in ADF with other activity.

Rahi Jangle 0 Reputation points
2024-07-11T20:07:42.1866667+00:00

I have Azure data factory pipeline which have data flow activity. Data flow activity points to source file in storage account gets data from it as a source then performs different transformations on data using conditional split, derived column, flatten transformation and so on....

We have Auto resolving integration runtime with managed virtual network . But we are not willing to use it as its cost is high. We have self hosted integration runtime as well but its not supported by data flow.

So that I want to replace entire data flow and transformations activities within it . So is it possible to achieve same in Azure data factory. Is there standard practice followed in industry to replace data flow?

Azure SQL Database
Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
5,147 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,228 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,885 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Pinaki Ghatak 4,690 Reputation points Microsoft Employee
    2024-07-12T09:48:37.4933333+00:00

    Hello @Rahi Jangle There are several transformation activities available in Azure Data Factory that you can use to replace the Data Flow activity.

    For example, you can use the following activities to perform transformations on your data:

    • HDInsight Hive activity
      • HDInsight Pig activity
      • HDInsight MapReduce activity
      • Stored Procedure activity
      • Data Lake Analytics U-SQL activity
      • .NET custom activity

    Each of these activities has its own set of capabilities and limitations. You can choose the activity that best fits your requirements.

    To replace the Data Flow activity, you need to create a new pipeline and add the appropriate activities to it. You can use the Copy activity to copy data from your source to your destination, and then use the transformation activities to perform the required transformations on the data.

    For example, you can use the HDInsight Hive activity to execute Hive queries on your own or on-demand Windows/Linux-based HDInsight cluster. You can use the HDInsight Pig activity to execute Pig queries on your own or on-demand Windows/Linux-based HDInsight cluster.

    You can use the HDInsight MapReduce activity to run MapReduce programs on your own or on-demand Windows/Linux-based HDInsight cluster.

    You can also use the Stored Procedure activity to execute stored procedures in Azure SQL, Azure Synapse Analytics, or SQL Server.

    You can use the Data Lake Analytics U-SQL activity to run U-SQL scripts in Azure Data Lake Analytics.

    You can use the .NET custom activity to run custom code in HDInsight or Azure Batch. Regarding the standard practice followed in the industry to replace Data Flow, it depends on the specific requirements and constraints of the project. However, using the appropriate transformation activity based on the requirements is a common practice. So you have many options here:

    1 person found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.