Disaster Recovery setup/High Availbility

Xhevahir Mehalla 160 Reputation points
2024-07-20T05:33:12.57+00:00

Hello again.

I need to some help deciding on how to setup a DR solution.

We have following things setup in DEV and UAT env:

  1. One subscription where DEV and UAT env reside using separate Vnet and subnets.
  2. Use Vnet Peering to connect both
  3. VPN gateway to using S2S to connect to Oracle OCI database, Connect to from On-prem to Azure
  4. Use Azure sql database as data warehouse (Serverless - DTU model)
  5. Use two azure apps within the same plan B2. Each app is used by the bank as Front end where they load data/files. We have an API which connects to azure sql database
  6. Use Azure synapse analytics as ETL tool to extract data from Oracle OCI database, store on data lake storage gen 2 account , transform and load them into azure sql db
  7. Use Azure Data Lake Gen 2 account to load csv file extracted from oracle
  8. Use an azure key vault to store logging details
  9. Use Private endpoints to access azure resources privately
  10. Create Private DNS zones to be able to access Apps/Front end.
  11. Use CloudFlare to add dns records for apps/front end. User can access the two apps from outside the Vnet. CloudFlare handle the FQDN resolution
  12. Use one VM (windows) as self-hosted integration run time for Azure synapse to be able to extract data from Oracle OCI db
  13. Use one VM Linux wher nginx is installed and handle the CloudFlare setup/Private endpoint IP setup

All above is Dev and UAT env - No high availability has been considered for that yet.

For prod we have this requirement:

  1. New Prod subscription will be created
  2. We need to build a DR process/setup to make sure that prod env is always up and running or at least when some major disaster happen we are able to failover and recover
  3. Both DR and Prod to be in the same subscription

I need to know what :

  1. Network setup I need to use in prod env and DR env. This is crucial as it stands I don't know what to use and I can read for ever but I need some guidance.
  2. How to failover and recover for each of the above resources
  3. Do I setup Prod env using (seperate Vnet/subsnet, RG, VPN gateway - one region) and DR env as a separate env as prod (vnet, subnets, RG) on a different Region?
  4. Who does this failover?
  5. what tools to use to do a failover?
  6. Do I need create a new DR resources like apps/FrontEnd and created everything from scratch?
  7. I do not know how to proceed with this.
  8. Client is cautions about the money spend
  9. The app is not critical but there's an expectation that it cannot be more than few hours off lines or a day max.

Please can someone help me on this and give me some guidance.

Thanks

Xhev

Azure SQL Database
Azure VPN Gateway
Azure VPN Gateway
An Azure service that enables the connection of on-premises networks to Azure through site-to-site virtual private networks.
1,556 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,974 questions
0 comments No comments
{count} votes

Accepted answer
  1. Amira Bedhiafi 25,866 Reputation points
    2024-07-20T12:13:07.28+00:00

    I am splitting your use case to different parts :

    What Network Setup Should Be Used for Prod and DR Environments?

    For your production environment, it is crucial to select the primary Azure region where all resources will reside. Create a VNet with subnets that segment different types of resources, such as Web, App, and Data subnets. Organize resources into dedicated RG to facilitate management.

    To ensure secure connectivity, set up a VPN gateway for linking your on-premises network and Oracle OCI database with Azure. Utilize private endpoints for secure access to Azure resources and configure private DNS zones for name resolution.

    In the DR environment, which should be in a different Azure region for geographic redundancy, mirror the production setup. This includes creating a VNet with a similar subnet structure, matching Resource Groups, and a VPN gateway for connectivity. Private endpoints and DNS zones should also be replicated to ensure seamless failover and connectivity during a disaster.

    How to Handle Failover and Recovery for Each Resource?

    Azure SQL Database benefits greatly from active geo-replication, enabling the creation of readable secondary databases in the DR region. During a disaster, initiating a failover to the secondary database ensures minimal downtime.

    For Azure Synapse Analytics, use ADF to periodically replicate data from the primary workspace to a secondary workspace in the DR region, enabling a quick switch in case of failure.

    For Azure Data Lake Gen 2, enabling geo-redundant storage replicates data across regions, allowing access from the secondary region if the primary is unavailable.

    Azure Key Vault should also use geo-redundant storage to ensure secrets are accessible from a secondary vault in the DR region during failover. Azure App Services should be deployed across multiple regions using Azure Traffic Manager to automatically route traffic to the DR region if the primary is down.

    Who Executes the Failover and What Tools Are Used?

    You can use Azure Site Recovery for replicating and managing VM failover since it allows the configuration of automated failover procedures, ensuring quick recovery. Azure Traffic Manager handles DNS-based traffic routing, automatically directing traffic to available regions, which is particularly useful for web applications. Regular backups managed by Azure Backup ensure data integrity and availability. For Azure SQL Database, active geo-replication provides a straightforward method for initiating database failovers.

    Do You Need to Create New DR Resources from Scratch?

    For DR readiness, it is not necessary to create everything from scratch but to ensure that all critical resources have replicated counterparts in the DR region. This includes setting up replicas of VMs, databases, storage accounts, and applications. Azure provides tools like ASR for VM replication and Traffic Manager for the failover of web applications. Regularly back up all necessary data and configurations to ensure the DR environment can be brought online quickly and efficiently.

    How to Manage Costs While Ensuring Adequate DR Coverage?

    Cost management is crucial, especially for resources that are not mission-critical but still require high availability. Scale the replication of only critical resources to keep costs manageable. Optimize backup frequencies to balance between cost and the desired Recovery Point Objective .


1 additional answer

Sort by: Most helpful
  1. Deleted

    This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.


    Comments have been turned off. Learn more

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.