Hello @Praveen Kumar Billa ,
Welcome to the Microsoft Q&A platform.
Azure Data Factory is the best tool for ingestion of the data.
Azure Data Factory allows you to visually design, build, debug, and execute data transformations at scale on Spark by leveraging Azure Databricks clusters. You can then operationalize your data flows inside a general ADF pipeline with scheduling, triggers, monitoring, etc.
ADF Data Flows provides a visually oriented design paradigm meant for code-free data transformation. You can also use ADF to execute code in Databricks, if you prefer to write code, using Databricks Notebooks, Python, JARs, etc. using the ADF pipeline activities.
- Azure Data Factory (ADF) – Now that ADF has a new feature called Data Flow, it can transform data so it is more than just an orchestration tool. Behind the scenes, the ADF JSON code that is created when you build a solution is converted to the appropriate code in the Scala programming language and is prepared, compiled and executed in Azure Databricks. This means Data Flow operates in an ELT manner: It loads the data into a place where Databricks can access it, performs the transformations, and then moves it to the destination. ADF provides a native ETL scheduler so that you can automate data transformation and movement processes either through visual data flows or via script activities that execute in Databricks or other execution engines (so, like with SSIS, data flows are row-by-row transformations and for large amounts of data it may be faster to execute a batch transformation via a script in Databricks). My thoughts on when to use ADF are obviously if you are already using it or if your skillset lies in SSIS as it’s pretty easy to learn ADF with a SSIS background.
- Azure Databricks – It is a Spark-based analytics platform which makes it great to use if you like to work with Spark, Python, Scala, and notebooks. When choosing between Databricks and ADF, what I’ve noticed is that it depends highly on the customer personas and their capabilities. There are plenty of Data Engineers and Data Scientists who want to get deep into Python or Scala and sling some code in Databricks Notebooks. But the larger audience who wants to focus on building business logic to clean customer/address data, for example, doesn’t want to learn Python libraries, and will use the ADF visual data flow designer. Many of those are also Data Engineers and Data Scientists, but then we start to move up the value stack to include Data Analysts and Business Analysts, which is where we start to overlap with Power BI Dataflow.
Either way, when you want to orchestrate these cleaning routines with schedules, triggers, and monitors, you want that to be through ADF. Keep in mind if you code your transformations in Databricks Notebooks, you will be responsible for maintaining that code, troubleshooting, and scheduling those routines.
For more details, you may refer “What product to use to transform your data”.
The source applications we have are salesforce,openair,netsuite:
- ADF - Salesforce connector
- Ingest your data to NetSuite using Azure Data Factory
- Unfortunately, openair connector is not available in ADF.
Hope this helps. Do let us know if you any further queries.
- Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.
- Want a reminder to come back and check responses? Here is how to subscribe to a notification.
Hello @Praveen Kumar Billa ,
It's not recommended to use Azure databricks for ingestions.
As per your requirement, the sources (salesforce,openair,netsuite) are not supported in Azure Databricks.
Reference: ADB - Data Sources.
ADF:
You pay for data pipeline orchestration by activity run and activity execution by integration runtime hours.
Data Flows are visually-designed components inside of Data Factory that enable data transformations at scale. You pay for the Data Flow cluster execution and debugging time per vCore-hour.
ADB:
ADB bills you for virtual machines (VMs) provisioned in clusters and Databricks Units (DBUs) based on the VM instance selected.
Hope this helps.
What needs to be used for openair. Any suggestions or advise. Also which one is cheaper between adb and adf..
Hello @Praveen Kumar Billa ,
From ADF make API calls to Openair to get data using Copy data from and to a REST endpoint by using Azure Data Factory.
Azure Data Factory is cheaper than Azure Databricks.
Hope this helps.
---------------------------------------------------------------------------
Please "Accept the answer" if the information helped you. This will help us and others in the community as well.
Hello @Praveen Kumar Billa ,
Just checking in to see if the above answer helped. If this answers your query, do click
Accept Answer
andUp-Vote
for the same. And, if you have any further query do let us know.Hello @Praveen Kumar Billa ,
Following up to see if the above suggestion was helpful. And, if you have any further query do let us know.
Sign in to comment