Praveen Kumar Billa 1 Reputation point

I want to know which tool is good for ingestion whether it’s ADF or AzureDatabricks.The source applications we have are salesforce,openair,netsuite..please let me know the reasons for choosing one over the other..

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,917 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,529 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 76,921 Reputation points Microsoft Employee

    Hello @Praveen Kumar Billa ,

    Welcome to the Microsoft Q&A platform.

    Azure Data Factory is the best tool for ingestion of the data.

    Azure Data Factory allows you to visually design, build, debug, and execute data transformations at scale on Spark by leveraging Azure Databricks clusters. You can then operationalize your data flows inside a general ADF pipeline with scheduling, triggers, monitoring, etc.

    ADF Data Flows provides a visually oriented design paradigm meant for code-free data transformation. You can also use ADF to execute code in Databricks, if you prefer to write code, using Databricks Notebooks, Python, JARs, etc. using the ADF pipeline activities.

    • Azure Data Factory (ADF) – Now that ADF has a new feature called Data Flow, it can transform data so it is more than just an orchestration tool. Behind the scenes, the ADF JSON code that is created when you build a solution is converted to the appropriate code in the Scala programming language and is prepared, compiled and executed in Azure Databricks. This means Data Flow operates in an ELT manner: It loads the data into a place where Databricks can access it, performs the transformations, and then moves it to the destination. ADF provides a native ETL scheduler so that you can automate data transformation and movement processes either through visual data flows or via script activities that execute in Databricks or other execution engines (so, like with SSIS, data flows are row-by-row transformations and for large amounts of data it may be faster to execute a batch transformation via a script in Databricks). My thoughts on when to use ADF are obviously if you are already using it or if your skillset lies in SSIS as it’s pretty easy to learn ADF with a SSIS background.
      • Azure Databricks – It is a Spark-based analytics platform which makes it great to use if you like to work with Spark, Python, Scala, and notebooks. When choosing between Databricks and ADF, what I’ve noticed is that it depends highly on the customer personas and their capabilities. There are plenty of Data Engineers and Data Scientists who want to get deep into Python or Scala and sling some code in Databricks Notebooks. But the larger audience who wants to focus on building business logic to clean customer/address data, for example, doesn’t want to learn Python libraries, and will use the ADF visual data flow designer. Many of those are also Data Engineers and Data Scientists, but then we start to move up the value stack to include Data Analysts and Business Analysts, which is where we start to overlap with Power BI Dataflow.

    Either way, when you want to orchestrate these cleaning routines with schedules, triggers, and monitors, you want that to be through ADF. Keep in mind if you code your transformations in Databricks Notebooks, you will be responsible for maintaining that code, troubleshooting, and scheduling those routines.

    For more details, you may refer “What product to use to transform your data”.

    The source applications we have are salesforce,openair,netsuite:

    Hope this helps. Do let us know if you any further queries.

    • Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification.