Azure Data Factory from Oracle On-Prem to Cosmos DB NoSql

Question

Azure Data Factory from Oracle On-Prem to Cosmos DB NoSql

Alessandro Pieri 25

Hi everyone,

I'm new to Azure Data Factory and have experience with Mulesoft. I'm evaluating this tool for a pure ETL scenario and would appreciate some insights.

Specifically, I'm looking for information on its capabilities and limitations in database synchronization scenarios. Has anyone here worked on synchronizing an Oracle Database with Azure Cosmos DB? I need to perform an initial load (IDL) of 30 million records and then handle delta UPSERT/DELETE operations (around 1.5 million records per year) on 8-10 entities/tables.

Additionally, I would like to understand how Azure Data Factory manages success cases, error handling, and retry mechanisms. Does this scenario require a complex data flow design, or can a simple copy activity cover all these requirements?

For context, the Oracle database is located in the client's on-premises data center. I'm considering using an Azure Integration Runtime in the cloud with a VPC/VPN configured from the Azure portal to their data center.

My solution would involve querying Oracle tables/entities based on LAST_UPDATE for both UPSERT and DELETE operations, so the implementation should include a watermark to resume from the last processed point in subsequent runs.

Any advice or experiences you can share would be greatly appreciated!

Thanks!

Chandra Boorla 14,675 Reputation points Microsoft External Staff Moderator

2025-01-10T16:37:38.0833333+00:00

@Alessandro Pieri

Just checking in to see if the below answer provided by @ Vinodh247 helped.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Accepted answer

0 additional answers

Your answer

Chandra Boorla 14,675 Reputation points Microsoft External Staff Moderator

2025-01-10T16:37:38.0833333+00:00

@Alessandro Pieri

Just checking in to see if the below answer provided by @ Vinodh247 helped.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer 1

Vinodh247 34,746 MVP Volunteer Moderator

Use a Self-Hosted Integration Runtime or a securely configured Azure Integration Runtime to connect to Oracle on-prem.
Copy Activity suffices for initial loading and straightforward upserts. Combine it with a Data Flow (or a separate pipeline step) for deletes if needed.
Implement a watermark strategy to handle deltas (UPSERT/DELETE), leveraging the LAST_UPDATE field or equivalent.
Rely on ADF’s built-in error handling and retries via activity settings and pipeline branching.
Scale Cosmos DB RUs appropriately during bulk loads to avoid throttling and tune parallelism.

This setup should cover your pure ETL scenario for Oracle-to-Cosmos DB synchronization without requiring an overly complex design. Let me know if you need detailed steps on any of these.

Alessandro Pieri 25 Reputation points

2025-01-11T13:16:11.03+00:00

Hi Vinodh247,

thank you for the detailed response. I will try to put your advice into practice and then get back to you to update you on whether I managed to set up a performant solution. If I have any doubts, I'll ask for further advice here in this thread.

Thanks again.

Share via

Azure Data Factory from Oracle On-Prem to Cosmos DB NoSql

0 additional answers

Your answer