I am splitting your use case to different parts :
What Network Setup Should Be Used for Prod and DR Environments?
For your production environment, it is crucial to select the primary Azure region where all resources will reside. Create a VNet with subnets that segment different types of resources, such as Web, App, and Data subnets. Organize resources into dedicated RG to facilitate management.
To ensure secure connectivity, set up a VPN gateway for linking your on-premises network and Oracle OCI database with Azure. Utilize private endpoints for secure access to Azure resources and configure private DNS zones for name resolution.
In the DR environment, which should be in a different Azure region for geographic redundancy, mirror the production setup. This includes creating a VNet with a similar subnet structure, matching Resource Groups, and a VPN gateway for connectivity. Private endpoints and DNS zones should also be replicated to ensure seamless failover and connectivity during a disaster.
How to Handle Failover and Recovery for Each Resource?
Azure SQL Database benefits greatly from active geo-replication, enabling the creation of readable secondary databases in the DR region. During a disaster, initiating a failover to the secondary database ensures minimal downtime.
For Azure Synapse Analytics, use ADF to periodically replicate data from the primary workspace to a secondary workspace in the DR region, enabling a quick switch in case of failure.
For Azure Data Lake Gen 2, enabling geo-redundant storage replicates data across regions, allowing access from the secondary region if the primary is unavailable.
Azure Key Vault should also use geo-redundant storage to ensure secrets are accessible from a secondary vault in the DR region during failover. Azure App Services should be deployed across multiple regions using Azure Traffic Manager to automatically route traffic to the DR region if the primary is down.
Who Executes the Failover and What Tools Are Used?
You can use Azure Site Recovery for replicating and managing VM failover since it allows the configuration of automated failover procedures, ensuring quick recovery. Azure Traffic Manager handles DNS-based traffic routing, automatically directing traffic to available regions, which is particularly useful for web applications. Regular backups managed by Azure Backup ensure data integrity and availability. For Azure SQL Database, active geo-replication provides a straightforward method for initiating database failovers.
Do You Need to Create New DR Resources from Scratch?
For DR readiness, it is not necessary to create everything from scratch but to ensure that all critical resources have replicated counterparts in the DR region. This includes setting up replicas of VMs, databases, storage accounts, and applications. Azure provides tools like ASR for VM replication and Traffic Manager for the failover of web applications. Regularly back up all necessary data and configurations to ensure the DR environment can be brought online quickly and efficiently.
How to Manage Costs While Ensuring Adequate DR Coverage?
Cost management is crucial, especially for resources that are not mission-critical but still require high availability. Scale the replication of only critical resources to keep costs manageable. Optimize backup frequencies to balance between cost and the desired Recovery Point Objective .