Transition Azure Data Factory Mapping Data Flows to Azure Databricks (Scala?)

Todd Lazure 46 Reputation points
2023-06-13T17:23:24.64+00:00

As a result of some infrastructure changes that are outside of my control, my organization is shifting from the use of Azure Data Factory (including Mapping Data Flows) to Azure Databricks.

It is my understanding that ADF Mapping Data Flows functionally run on a Databricks or otherwise Spark cluster in the background. The Data Flow GUI allows for codeless development, while the Mapping Data Flow script is a dataflow-specific scripting language that allows programmatic development of these Mapping Data Flows.

However, we have considerable resources that are already built in Mapping Data Flows that need to be transitioned. To smoothly conduct this transition, is there a way to expose the language/logic that is being sent to the Spark compute cluster? Ideally in a way that can be migrated to the Azure Databricks platform easily, such as the raw Scala.

When debugging the data flow and errors occur, there's some small leakthrough and insight into this language (usually Scala errors), but I haven't seen anywhere to see the Scala in full.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,436 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,507 questions
0 comments No comments
{count} votes

Accepted answer
  1. Bhargava-MSFT 31,246 Reputation points Microsoft Employee Moderator
    2023-06-14T21:00:31.5366667+00:00

    Hello Todd Lazure,

    Your understanding is correct that Azure Data Factory Mapping Data Flows functionally run on a Databricks or otherwise Spark cluster in the background. However, the Mapping Data Flow script is a dataflow-specific scripting language that allows programmatic development of these Mapping Data Flows, and this is different from the Scala language used in Databricks.

    While it is possible to see some of the generated Spark code in Scala when debugging Mapping Data Flows in Azure Data Factory, this code is not publicly available in full. so, it is not possible to directly convert Mapping Data Flow code to Scala on Azure Databricks.

    To transition your Mapping Data Flows to Azure Databricks, you will need to manually convert the logic and syntax used in your Mapping Data Flows to the appropriate Databricks libraries and syntax. This may involve significant modifications to your existing code, depending on the complexity of your Mapping Data Flows.

    I hope this answers your question.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.