Azure Data Factory from Oracle On-Prem to Cosmos DB NoSql

Alessandro Pieri 25 Reputation points
2025-01-09T10:51:29.1933333+00:00

Hi everyone,

I'm new to Azure Data Factory and have experience with Mulesoft. I'm evaluating this tool for a pure ETL scenario and would appreciate some insights.

Specifically, I'm looking for information on its capabilities and limitations in database synchronization scenarios. Has anyone here worked on synchronizing an Oracle Database with Azure Cosmos DB? I need to perform an initial load (IDL) of 30 million records and then handle delta UPSERT/DELETE operations (around 1.5 million records per year) on 8-10 entities/tables.

Additionally, I would like to understand how Azure Data Factory manages success cases, error handling, and retry mechanisms. Does this scenario require a complex data flow design, or can a simple copy activity cover all these requirements?

For context, the Oracle database is located in the client's on-premises data center. I'm considering using an Azure Integration Runtime in the cloud with a VPC/VPN configured from the Azure portal to their data center.

My solution would involve querying Oracle tables/entities based on LAST_UPDATE for both UPSERT and DELETE operations, so the implementation should include a watermark to resume from the last processed point in subsequent runs.

Any advice or experiences you can share would be greatly appreciated!

Thanks!

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,128 questions
{count} votes

Accepted answer
  1. Vinodh247 27,201 Reputation points MVP
    2025-01-09T16:37:42.92+00:00
    • Use a Self-Hosted Integration Runtime or a securely configured Azure Integration Runtime to connect to Oracle on-prem.
    • Copy Activity suffices for initial loading and straightforward upserts. Combine it with a Data Flow (or a separate pipeline step) for deletes if needed.
    • Implement a watermark strategy to handle deltas (UPSERT/DELETE), leveraging the LAST_UPDATE field or equivalent.
    • Rely on ADF’s built-in error handling and retries via activity settings and pipeline branching.
    • Scale Cosmos DB RUs appropriately during bulk loads to avoid throttling and tune parallelism.

    This setup should cover your pure ETL scenario for Oracle-to-Cosmos DB synchronization without requiring an overly complex design. Let me know if you need detailed steps on any of these.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.