Tutorial: Migrate WebSphere Liberty/Open Liberty to Azure Kubernetes Service with high availability and disaster recovery

This tutorial shows you a simple and effective way to implement high availability and disaster recovery (HA/DR) for Java using WebSphere Liberty/Open Liberty on Azure Kubernetes Service (AKS). The solution illustrates how to achieve a low Recovery Time Objective (RTO) and Recovery Point Objective (RPO) using a simple database-driven Jakarta EE application running on WebSphere Liberty/Open Liberty.

HA/DR is a complex topic, with many possible solutions. The best solution depends on your unique requirements. For other ways to implement HA/DR, see the resources at the end of this article.

In this tutorial, you learn how to:

  • Use Azure optimized best practices to achieve high availability and disaster recovery.
  • Set up a Microsoft Azure SQL Database failover group in paired regions.
  • Set up the primary WebSphere Liberty/Open Liberty cluster on AKS.
  • Set up disaster recovery for the cluster using Azure Backup.
  • Set up the secondary AKS cluster.
  • Set up an Azure Traffic Manager.
  • Test failover from primary to secondary.

The following diagram illustrates the architecture you build:

Diagram of the solution architecture of WebSphere Liberty/Open Liberty on AKS with high availability and disaster recovery.

Azure Traffic Manager checks the health of your regions and routes the traffic accordingly to the application tier. Both the primary region and the secondary region have a full deployment of the Liberty cluster. However, only the primary region is actively servicing network requests from the users. The secondary region is passive, and is activated to receive traffic only when the primary region experiences a service disruption. Azure Traffic Manager uses the health check feature of the Azure Application Gateway to implement this conditional routing. The primary cluster is running and the secondary cluster is shut down. The geo-failover RTO of the application tier depends on the time for starting virtual machines (VMs) and running the secondary cluster. The RPO depends on the Azure SQL Database because the data is persisted and replicated in the Azure SQL Database failover group.

The database tier consists of an Azure SQL Database failover group with a primary server and a secondary server. The read/write listener endpoint always points to the primary server and is connected to a WebSphere Liberty/Open Liberty cluster in each region. A geo-failover switches all secondary databases in the group to the primary role. For the geo-failover RPO and RTO of Azure SQL Database, see Overview of business continuity with Azure SQL Database.

This tutorial was written with the Azure Backup and Azure SQL Database services because the tutorial relies on the HA features of these services. Other database choices are possible, but you must consider the HA features of whatever database you choose.

Prerequisites

Set up an Azure SQL Database failover group in paired regions

In this section, you create an Azure SQL Database failover group in paired regions for use with your WebSphere Liberty/Open Liberty clusters and app. In a later section, you configure WebSphere Liberty/Open Liberty to store its session data to this database. This practice references Creating a table for session persistence.

First, create the primary Azure SQL Database by following the Azure portal steps in Quickstart: Create a single database - Azure SQL Database. Follow the steps up to, but not including, the "Clean up resources" section. Use the following directions as you go through the article, then return to this article after you create and configure the Azure SQL Database.

When you reach the section Create a single database, use the following steps:

  1. In step 4 for creating new resource group, save aside the Resource group name value - for example, myResourceGroup.
  2. In step 5 for database name, save aside the Database name value - for example, mySampleDatabase.
  3. In step 6 for creating the server, use the following steps:
    1. Fill in a unique server name - for example, sqlserverprimary-mjg032524.
    2. For Location, select (US) East US.
    3. For Authentication method, select Use SQL authentication.
    4. Save aside the Server admin login value - for example, azureuser.
    5. Save aside the Password value.
  4. In step 8, for Workload environment, select Development. Look at the description and consider other options for your workload.
  5. In step 11, for Backup storage redundancy, select Locally-redundant backup storage. Consider other options for your backups. For more information, see the Backup storage redundancy section of Automated backups in Azure SQL Database.
  6. In step 14, in the Firewall rules configuration, for Allow Azure services and resources to access this server, select Yes.

Then, create an Azure SQL Database failover group by following the Azure portal steps in Configure a failover group for Azure SQL Database. You just need the following sections: Create failover group and Test planned failover. Use the following steps as you go through the article, then return to this article after you create and configure the Azure SQL Database failover group:

  1. When you reach the section Create failover group, use the following steps:

    1. In step 5 for creating the failover group, enter and save aside the unique failover group name - for example, failovergroup-mjg032524.
    2. In step 5 for configuring the server, select the option to create a new secondary server and then use the following steps:
      1. Enter a unique server name - for example, sqlserversecondary-mjg032524.
      2. Enter the same server admin and password as your primary server.
      3. For Location, select (US) West US.
      4. Make sure Allow Azure services to access server is selected.
    3. In step 5 for configuring the Databases within the group, select the database you created in the primary server - for example, mySampleDatabase.
  2. After you complete all the steps in the section Test planned failover, keep the failover group page open and use it for the failover test of the WebSphere Liberty/Open Liberty clusters later.

Set up the primary WebSphere Liberty/Open Liberty cluster on AKS

In this section, you create the primary WebSphere Liberty/Open Liberty cluster on AKS using the IBM WebSphere Liberty and Open Liberty on Azure Kubernetes Service offer. The secondary cluster is restored from the primary cluster during the failover using Azure Backup later.

Deploy the primary WebSphere Liberty/Open Liberty cluster

Use the following steps to deploy the primary cluster:

  1. Open the IBM WebSphere Liberty and Open Liberty on Azure Kubernetes Service offer in your browser and select Create. You should see the Basics pane of the offer.

  2. Use the following steps to fill out the Basics pane:

    1. Ensure that the value shown for Subscription is the same one that has the roles listed in the prerequisites section.
    2. You must deploy the offer in an empty resource group. In the Resource group field, select Create new and fill in a unique value for the resource group - for example, liberty-aks-eastus-mjg032524.
    3. Under Instance details, for Region, select East US.
    4. Select Next to go to the AKS pane.

    Screenshot of the Azure portal that shows the IBM WebSphere Liberty and Open Liberty on Azure Kubernetes Service Basics pane.

  3. Wait for a while. You should see all fields prepopulated with the defaults in the AKS pane. Select Next to go to the Load balancing pane.

    Screenshot of the Azure portal that shows the IBM WebSphere Liberty and Open Liberty on Azure Kubernetes Service AKS pane.

  4. Use the following steps to fill out the Load balancing pane:

    1. For Connect to Azure Application Gateway?, select Yes.
    2. Leave the defaults for other fields.
    3. Select Next to go to the Operator and application pane.

    Screenshot of the Azure portal that shows the IBM WebSphere Liberty and Open Liberty on Azure Kubernetes Service Load balancing pane.

  5. Use the following steps to fill out the Operator and application pane:

    1. Leave the defaults for all fields.

      Note

      This tutorial deploys Open Liberty Operator using the defaults. Optionally you can deploy WebSphere Liberty Operator by selecting Yes for IBM supported?.

    2. Select Review + create.

    3. Wait until Running final validation... successfully completes, then select Create.

    Screenshot of the Azure portal that shows the IBM WebSphere Liberty and Open Liberty on Azure Kubernetes Service Operator and application pane.

After a while, you should see the Deployment page where Deployment is in progress is displayed.

Note

If you see any problems during Running final validation..., fix them and try again.

Depending on network conditions and other activity in your selected region, the deployment can take up to about 30 minutes to complete. After that, you should see the text Your deployment is complete displayed on the deployment page.

Verify the deployment of the cluster

You deployed an AKS cluster, an Azure Container Registry (ACR) instance, and an Azure Application Gateway in the primary region. The AKS cluster is the target computing platform where your app is deployed and running. The ACR instance stores the application image that AKS pulls during the app deployment. The Azure Application Gateway acts as load balancer for the application deployed to the AKS cluster.

Use the following steps to verify these key components before you move to the next step:

  1. Return to the Deployment page, then select Outputs.

  2. Copy the value of the property cmdToConnectToCluster. Open a terminal, paste the copied command, and press Enter to execute. You should see a message similar to the following example included in the output:

    Merged "cluster3984d1-admin" as current context in <your-user>\.kube\config
    
  3. Save aside the command so you can use it to connect to the cluster later.

  4. Run kubectl get pod --all-namespaces in the terminal to list all pods running on the AKS cluster. You should see output similar to the following example:

    NAMESPACE      NAME                                        READY   STATUS    RESTARTS      AGE
    cert-manager   cert-manager-66bc9756fd-255pk               1/1     Running   0             8m55s
    cert-manager   cert-manager-cainjector-669c9fb694-k4q88    1/1     Running   0             8m55s
    cert-manager   cert-manager-webhook-84967d556d-vj4lp       1/1     Running   0             8m55s
    kube-system    azure-ip-masq-agent-dgzkt                   1/1     Running   0             29m
    kube-system    cloud-node-manager-6x7bp                    1/1     Running   0             29m
    kube-system    coredns-789789675-6b7dh                     1/1     Running   0             28m
    kube-system    coredns-789789675-n68wt                     1/1     Running   0             29m
    kube-system    coredns-autoscaler-649b947bbd-zhdbn         1/1     Running   0             29m
    kube-system    csi-azuredisk-node-h9p7m                    3/3     Running   0             29m
    kube-system    csi-azurefile-node-jnllw                    3/3     Running   0             29m
    kube-system    ingress-appgw-deployment-69944d8fb9-v9btr   1/1     Running   5 (12m ago)   17m
    kube-system    konnectivity-agent-94878f88c-hfqng          1/1     Running   0             29m
    kube-system    konnectivity-agent-94878f88c-ln2vp          1/1     Running   0             29m
    kube-system    kube-proxy-28lkg                            1/1     Running   0             29m
    kube-system    metrics-server-5fffcb8954-549xl             2/2     Running   0             28m
    kube-system    metrics-server-5fffcb8954-fn56g             2/2     Running   0             28m
    open-liberty   olo-controller-manager-7954d76cf8-qhmxw     1/1     Running   0             8m40s
    
  5. Run kubectl get secret in the terminal to list all secrets installed on the AKS cluster. You should see one secret in the output, as shown in the following example:

    NAME           TYPE                DATA   AGE
    secret3984d1   kubernetes.io/tls   2      24m
    

    This secret is a TLS secret that includes certificate and key data for TLS traffic. Copy and save aside the name of the secret - for example, secret3984d1, you use it in the app deployment later.

  6. Switch back to Outputs page. Copy the value of the property cmdToLoginInRegistry. Paste the copied command in the terminal and press Enter to execute. You should see Login Succeeded in the output. Keep the terminal open and use it for further configuration of the WebSphere Liberty/Open Liberty cluster later.

Use the following steps to get the name and DNS name of the public IP address of the Azure Application Gateway. You use them for app deployment and the Azure Traffic Manager setup later.

  1. In the Azure portal, in the search box, enter Resource groups and select Resource groups from the search results.
  2. Select the name of resource group for your primary region - for example, liberty-aks-eastus-mjg032524.
  3. Find the Public IP address resource prefixed with gwip, then copy and save aside its name.
  4. Select the Public IP address resource, then copy and save aside the DNS name - for example, olgw3984d1.eastus.cloudapp.azure.com.

Enable geo-replications for the ACR instance

The ACR instance is designed to store application images for both primary and secondary clusters. Use the following steps to enable geo-replications for the ACR instance:

  1. In the Azure portal, in the search box, enter Resource groups and select Resource groups from the search results.

  2. Select the name of resource group for your primary region - for example, liberty-aks-eastus-mjg032524.

  3. Find the Container registry resource prefixed with acr, then select it to open it.

  4. Select Properties. For Pricing plan, select Premium, then select Save. Wait until completion.

  5. Select Geo-replications, then select Add. For Location, select West US, then select Create. Wait until completion.

  6. Wait for a while, select Refresh. Repeat the operation until you see two locations are listed and Status is Ready.

    Screenshot of the Azure portal that shows the ACR instance enabled with geo-replications in pair regions.

Use the following steps to get the ACR sign-in credentials. You use them for app deployment later.

  1. Select Access keys.
  2. Copy and save aside the value for Login server, Username, and password.

Deploy a sample app

Use the following steps to deploy and run a sample CRUD Java/Jakarta EE application on WebSphere Liberty/Open Liberty cluster for disaster recovery failover test later:

  1. Download the sample by using the following commands:

    git clone https://github.com/Azure-Samples/open-liberty-on-aks.git
    cd open-liberty-on-aks
    export BASE_DIR=$PWD
    git checkout 20240325
    

    The application configures a data source jdbc/JavaEECafeDB that connects to the Azure SQL Database you deployed previously. The data source is used for storing HTTP session data, which enables failover and load balancing across a cluster of WebSphere Liberty/Open Liberty servers. The sample app also configures a persistence schema to persist application data coffee in the same datasource. Notice that the context root of the sample is configured as / in the server.xml file.

  2. Use the following commands to define environment variables with the values that you saved aside previously:

    export DB_SERVER_NAME=<failover-group-name>.database.windows.net
    export DB_NAME=mySampleDatabase
    export DB_USER=azureuser@<failover-group-name>
    export DB_PASSWORD='<SQL-Server-admin-login-password>'
    export LOGIN_SERVER=<ACR-login-server>
    export USER_NAME=<ACR-username>
    export PASSWORD='<ACR-password>'
    export INGRESS_TLS_SECRET=<TLS-secret-name>
    
  3. Use the following commands to package the app, build the Docker image, push the image to the ACR instance, and deploy the sample to the AKS cluster:

    cd $BASE_DIR/java-app
    mvn clean install
    
    cd $BASE_DIR/java-app/target
    
    # If you deployed WebSphere Liberty Operator previously, use "Dockerfile-wlp" instead of "Dockerfile"
    docker buildx build --platform linux/amd64 -t javaee-cafe:v1 --pull --file=Dockerfile .
    docker tag javaee-cafe:v1 ${LOGIN_SERVER}/javaee-cafe:v1
    docker login -u ${USER_NAME} -p ${PASSWORD} ${LOGIN_SERVER}
    docker push ${LOGIN_SERVER}/javaee-cafe:v1
    
    cd $BASE_DIR/java-app/target
    kubectl apply -f db-secret.yaml
    
    # If you deployed WebSphere Liberty Operator previously, use "webspherelibertyapplication-agic.yaml" instead of "openlibertyapplication-agic.yaml"
    kubectl apply -f openlibertyapplication-agic.yaml
    
  4. Run the following command to get the sample app you deployed:

    # If you deployed WebSphere Liberty Operator previously, use "WebSphereLibertyApplication" instead of "OpenLibertyApplication"
    kubectl get OpenLibertyApplication
    

    You should see one READY application in the output:

    NAME                       IMAGE                                 EXPOSED   RECONCILED   RESOURCESREADY   READY   AGE
    javaee-cafe-cluster-agic   acr3984d1.azurecr.io/javaee-cafe:v1             True         True             True    45s
    
  5. Run the following command to get status of the pods created during the deployment:

    kubectl get pods
    

    The following example indicates that all the pods are running. If you don't see similar output, wait for a while and repeat the operation.

    NAME                                        READY   STATUS    RESTARTS   AGE
    javaee-cafe-cluster-agic-6bbb8d6f5c-2xjc4   1/1     Running   0          1m
    javaee-cafe-cluster-agic-6bbb8d6f5c-4f449   1/1     Running   0          1m
    javaee-cafe-cluster-agic-6bbb8d6f5c-m2wg6   1/1     Running   0          1m
    
  6. Use the following steps to verify that the app is running as expected:

    1. In a new browser tab, open the DNS name of the public IP address of the Azure Application Gateway that you saved aside previously. Use the https protocol - for example, https://olgw3984d1.eastus.cloudapp.azure.com. You should see the welcome page of sample app.

    2. Create a new coffee with name and price - for example, Coffee 1 with price 10 - which is persisted into both the application data table and the session table of the database. The UI that you see should be similar to the following screenshot:

      Screenshot of the sample application UI.

    If your UI doesn't look similar, troubleshoot and resolve the problem before you continue.

Set up disaster recovery for the cluster using Azure Backup

In this section, you set up disaster recovery for the AKS cluster in the primary region using Azure Backup.

Create a storage account

AKS backup uses a blob container to hold the AKS cluster resources. You create another blob container as a staging location for use during cross region restoring.

Use the following steps to create a storage account and two containers. Some of these steps direct you to other guides.

  1. Sign in to the Azure portal.
  2. Create a storage account by following the steps in Create a storage account. You don't need to do all the steps in the article. Fill out the fields as shown on the Basics pane, using the following steps:
    1. For Resource group, select the existing resource group where the primary cluster is deployed - for example, liberty-aks-eastus-mjg032524.
    2. For Storage account name, enter a unique name - for example, storageeastusmjg032524.
    3. For Region, select East US.
    4. Select Review + create to accept the default options.
    5. Proceed to validate and create the account, then return to this article.
  3. Create a storage container for the AKS Backup Extension by following the steps in Create a storage container. This guide uses aks-backup-ext as the container name.
  4. Create another storage container as a staging location for use during restoring. This guide uses staging as the container name.

Enable the AKS Backup Extension

Before you continue, use the following steps to install the AKS Backup Extension to the cluster in primary region:

  1. Enable the CSI drivers and snapshots for your cluster. For the following az aks update command, update the value of Bash variable RG_NAME to your resource group name - for example, liberty-aks-eastus-mjg032524 - and run in your local Bash terminal.

    export RG_NAME=<your-aks-cluster-resource-group>
    export AKS_NAME=$(az aks list \
        --resource-group ${RG_NAME} \
        --query "[0].name" \
        --output tsv | tr -d '\r')
    
    az aks update \
        --resource-group ${RG_NAME} \
        --name ${AKS_NAME} \
        --enable-disk-driver \
        --enable-file-driver \
        --enable-blob-driver \
        --enable-snapshot-controller --yes
    

    It takes about 5 minutes to enable the drivers. Make sure the commands complete without error before you continue.

  2. Open the resource group that has AKS deployed - for example, liberty-aks-eastus-mjg032524. Select the AKS cluster from resources list.

  3. Under Settings of the AKS landing page, select Back up, then select Install Extension.

  4. In the Install AKS Backup extension page, select Next. Select the storage account storageeastusmjg032524 and the blob container aks-backup-ext created in the same resource group. Select Next, then select Create. It takes about five minutes to complete this step.

Back up the AKS cluster

Use the following steps to back up the AKS cluster:

  1. In the Azure portal, in the search box, search for Backup vaults. You see it listed under Services. Select it.

  2. Follow the steps in Back up Azure Kubernetes Service by using Azure Backup to enable AKS Backup for the primary cluster. Execute the steps up to, but not including, the Use hooks during AKS backup section, and use the rest of the steps in this section to make adjustments as you go.

  3. When you reach the Create a Backup vault section, use the following steps:

    1. For step 1, for Resource group, select the existing resource group where the primary cluster is deployed - for example, liberty-aks-eastus-mjg032524.

    2. For Backup vault name, enter a unique value - for example, aks-backup-vault-eastus-mjg032524.

    3. For Region, select East US.

    4. For Backup Storage Redundancy, select Globally-Redundant.

      Screenshot of the Azure portal that shows the Backup Vault Basic pane.

    5. For step 2, for Cross Region Restore, select Enable.

  4. When you reach the Create a backup policy section, use the following steps:

    1. For step 3, enter a name for the backup policy - for example, aksbackuppolicy.

    2. Select the Backup vault you created in the same resource group - for example, aks-backup-vault-eastus-mjg032524.

    3. For step 4, add a retention rule where Vault-standard is selected.

      Screenshot of the Azure portal that shows the Create Backup Policy page with the Add retention pane open and the Vault-standard option highlighted.

    4. Select Add.

  5. In the Configure backups section, use the following steps:

    1. Skip step 1-5, which are for AKS Extension installation. Start from step 6 for the AKS cluster in the primary region.

    2. For step 7, for Vault, select the backup vault you created in the same resource group - for example, aks-backup-vault-eastus-mjg032524. When you run into permission errors, select Grant Permissions to move on. After the permission deployment completes, if the error still shows, select Revalidate to refresh the role assignments.

      Screenshot of the Azure portal that shows the Configure Backup Basics pane with permission errors and with the Grant Permissions link highlighted.

    3. For step 10, find Select Resources to Backup. For Backup Instance name, fill in a unique name - for example, akseastusmjg032524. For Other options, select all the options. Make sure Include Secrets is selected.

      Screenshot of the Azure portal that shows the Select Resources to Backup pane with the Include Secrets option highlighted.

    4. For step 11, you run into Role assignment error. Follow step 12-14 to mitigate the error.

      Screenshot of the Azure portal that shows the Configure Backup pane with the Grant missing permissions dialog box open.

    5. After you select Configure backup in step 15, you return to the Backup page. Wait for a while and then select Refresh. Repeat the operation until you see that the backup instance is listed and its Protection status is Protection configured.

      Screenshot of the Azure portal that shows the AKS backup instance protection is configured.

Wait for a Vault-standard backup to happen

In AKS, the Vault-standard Tier is the only tier that supports Geo-redundancy and Cross Region Restore. As stated in Which backup storage tier does AKS backup support?, "Only one scheduled recovery point per day is moved to Vault Tier." You must wait for a Vault-standard backup to happen. A good lower bound is to wait at most 24 hours after completing the previous step before you restore.

Use the following steps to verify that a Vault-standard backup is available:

  1. In Backup page of the primary AKS cluster, select the backup instance.

  2. Wait for a while and select Refresh. Repeat the operation until you see that at least one Operational and Vault-standard restore point is listed in the RESTORE POINTS section.

    Screenshot of the Azure portal that shows the Restore points section with the Operational and Vault-standard restore point highlighted.

Set up the secondary AKS cluster

While waiting for a Vault-standard backup for the primary AKS cluster to happen, set up your secondary AKS cluster for restoring later.

Use the same steps in the section Deploy the primary WebSphere Liberty/Open Liberty cluster to set up the secondary AKS cluster in the secondary region, except for the following differences:

  1. In the Basics pane, use the following steps:

    1. In the Resource group field, select Create new and fill in a different unique value for the resource group - for example, liberty-aks-westus-mjg032524.
    2. Under Instance details, for Region, select West US.
  2. In the AKS pane, use the following steps:

    1. Under Azure Container Registry (ACR), for Select ACR instance, select No.

    2. Select the existing ACR instance in the primary region that enabled with geo-replications.

      Screenshot of the Azure portal Create IBM WebSphere Liberty and Open Liberty on Azure Kubernetes Service page with the ACR instance highlighted.

Use the same steps in the section Verify the deployment of the cluster to verify the deployment in the secondary region, except for the following differences:

  1. You don't need to copy and save aside the name of the TLS secret. The TLS secret is restored from the backup of the primary AKS cluster.
  2. Use the resource group of the secondary cluster - for example, liberty-aks-westus-mjg032524 - when you look up the the name and DNS name of the public IP address of the Azure Application Gateway deployed in the secondary region.

Use the same steps in the section Create a storage account to create a storage account in secondary region, except for the following differences:

  1. For Resource group field, select the existing resource group where the secondary cluster is deployed - for example, liberty-aks-westus-mjg032524.
  2. For Storage account name, enter a unique name - for example, storagewestusmjg032524.
  3. For Region, select West US.

Use the same steps in the section Enable the AKS Backup Extension to install the AKS Backup Extension for the cluster in the secondary region, except for the following differences:

  1. In step 1 for enabling the CSI drivers and snapshots for your secondary cluster, update the value of Bash variable RG_NAME to the resource group in the secondary region - for example, liberty-aks-westus-mjg032524.
  2. In step 2, select the AKS cluster from the resource group in the secondary region - for example, liberty-aks-westus-mjg032524.
  3. In step 4 for installing the AKS Backup Extension for your secondary cluster, select the storage account you created in the same resource group of the secondary region - for example, storagewestusmjg032524.

To save costs, stop the AKS cluster in the secondary region by following the steps in Stop and start an Azure Kubernetes Service (AKS) cluster. You need to start it before you restore the cluster later.

Set up an Azure Traffic Manager

Vault-standard backup was mentioned in section Wait for a Vault-standard backup to happen. After you see that a Vault-standard backup is available, you can create an Azure Traffic Manager for distributing traffic to your public facing applications across the global Azure regions. The primary endpoint points to the public IP address of the Azure Application Gateway in the primary region. The secondary endpoint points to the public IP address of the Azure Application Gateway in the secondary region.

Create an Azure Traffic Manager profile by following the steps in Quickstart: Create a Traffic Manager profile using the Azure portal. You just need the following sections: Create a Traffic Manager profile and Add Traffic Manager endpoints. Use the following steps as you go through these sections, then return to this article after you create and configure the Azure Traffic Manager:

  1. When you reach the section Create a Traffic Manager profile, in step 2, for Create Traffic Manager profile, use the following steps:

    1. For Name, enter a unique Traffic Manager profile name - for example, tmprofile-mjg032524.
    2. For Routing method, select Priority.
    3. For Resource group, enter and save aside the new resource group name - for example, myResourceGroupTM1.
  2. When you reach the section Add Traffic Manager endpoints, use the following steps:

    1. After you open the Traffic Manager profile in step 2, in the Configuration page, use the following steps:
      1. For DNS time to live (TTL), enter 10.
      2. Under Endpoint monitor settings, for Protocol, select https, and for Port, enter 443.
      3. Under Fast endpoint failover settings, use the following values:
        • For Probing internal, select 10.
        • For Tolerated number of failures, enter 3.
        • For Probe timeout, enter 5.
      4. Select Save. Wait until it completes.
    2. In step 4 for adding the primary endpoint myPrimaryEndpoint, use the following steps:
      1. For Target resource type, select Public IP address.
      2. Select the Choose public IP address dropdown and enter the name of the public IP address of the Azure Application Gateway in the East US region that you saved aside previously. You should see one entry matched. Select it for Public IP address.
    3. In step 6 for adding a failover/secondary endpoint myFailoverEndpoint, use the following steps:
      1. For Target resource type, select Public IP address.
      2. Select the Choose public IP address dropdown and enter the name of the public IP address of the Azure Application Gateway in the West US region that you saved aside previously. You should see one entry matched. Select it for Public IP address.
    4. Wait for a while. Select Refresh until the Monitor status for endpoint myPrimaryEndpoint is Online and Monitor status for endpoint myFailoverEndpoint is Degraded.

Next, use the following steps to verify that the sample app deployed to the primary cluster is accessible from the Traffic Manager profile:

  1. Select Overview of the Traffic Manager profile you created.

  2. Check and copy down the DNS name of the Traffic Manager profile, replacing the protocol http with https. For example, https://tmprofile-mjg032524.trafficmanager.net.

  3. Open the URL in a new browser tab. You should see that the coffee you created previously is listed in the page.

  4. Create another coffee with a different name and price - for example, Coffee 2 with price 20 - which is persisted into both the application data table and the session table of the database. The UI that you see should be similar to the following screenshot:

    Screenshot of the sample application UI with the second coffee.

If your UI doesn't look similar, troubleshoot and resolve the problem before you continue. Keep the console open and use it for the failover test later.

You've completed the Traffic Manager profile setup. Keep the page open and you use it for monitoring the endpoint status change in a failover event later.

Test failover from primary to secondary

In this section, to test failover, you manually failover your Azure SQL Database server and restore the backup of your AKS cluster, and then fail back using the Azure portal.

Failover to the secondary site

To simulate an outage of the primary region, stop the primary AKS cluster by following the steps in Stop and start an Azure Kubernetes Service (AKS) cluster.

Next, start the secondary AKS cluster so it can be restored from the backup of the primary cluster.

Note

If you have WebSphere Liberty/Open Liberty applications running on the restore target cluster, to avoid conflicts, use the following steps to clean up WebSphere Liberty/Open Liberty applications:

  • Connect to the target cluster by running the command for cmdToConnectToCluster that you saved aside previously.

  • For Open Liberty applications, run the following command:

    kubectl delete OpenLibertyApplication --all --all-namespaces
    
  • For WebSphere Liberty applications, run the following command:

    kubectl delete WebSphereLibertyApplication --all --all-namespaces
    

Then, switch to the browser tab of your Traffic Manager profile, and verify that the Monitor status for both endpoints myPrimaryEndpoint and myFailoverEndpoint is Degraded.

Now, use the following steps to failover the Azure SQL Database from the primary server to the secondary server:

  1. Switch to the browser tab of your Azure SQL Database failover group - for example, failovergroup-mjg032524.
  2. Select Failover, then select Yes.
  3. Wait until it completes.

Next, use the following steps to restore the backup of the primary AKS cluster to the secondary AKS cluster:

  1. In the Azure portal, in the search box, enter Backup center and select Backup center from the search results.

  2. Under Manage, select Backup instances. Filter on the datasource type Kubernetes Services. Find the backup instance you created in the previous section - for example, <aks-cluster-name>\akseastusmjg032524.

  3. Select the backup instance.

  4. Select Restore.

  5. On the Restore page, the default pane is Restore point. Select Previous to change to the Basics pane. For Restore Region, select Secondary Region, then select Next: Restore point.

    Screenshot of the Azure portal that shows the Restore Basics pane.

  6. In the Restore point pane, the latest Operational and Vault-standard restore point is selected. Keep the defaults and select Next: Restore parameters.

    Screenshot of the Azure portal that shows the Restore point pane.

  7. On the Restore parameters pane, use the following steps:

    1. For Select Target cluster, select the secondary AKS cluster that you created in the West US region. You run into a permission issue as shown in the following screenshot. Select Grant Permission to mitigate the errors.

    2. For Backup Staging Location, select the Storage Account that you created in the West US region. You run into a permission issue as shown in the following screenshot. Select Assign missing roles to mitigate the errors.

      Screenshot of the Azure portal that shows the Restore parameters pane.

    3. If the errors still happen after the role assignments finish, select Revalidate to refresh the permissions.

    4. When granting missing permissions, if asked to specify a Scope, accept the default value.

    5. Select Validate. You should see the message, Validation completed successfully. Otherwise, troubleshoot and resolve the problem before you continue.

  8. Select Next: Review + restore. Then select Restore. It takes about 10 minutes to restore the cluster.

  9. You can monitor the restore process from Backup center > Monitoring + reporting > Backup jobs, as shown in the following screenshot:

    Screenshot of the Azure portal that shows a CrossRegionRestore in progress.

  10. Wait for a while, then select Refresh. Repeat the operation until you see that Status becomes Completed.

Then, use the following steps to verify that the restore works as expected:

  1. Switch to the terminal where you connected to the secondary AKS cluster.

  2. Run the following command to get the sample app restored from the backup:

    kubectl get OpenLibertyApplication
    

    You should see one READY application in the output:

    NAME                       IMAGE                                 EXPOSED   RECONCILED   RESOURCESREADY   READY   AGE
    javaee-cafe-cluster-agic   acr3984d1.azurecr.io/javaee-cafe:v1             True         True             True    3m
    
  3. Run the following command to get status of the pods created during the deployment:

    kubectl get pods
    

    You should see three Running pods in the output:

    NAME                                        READY   STATUS    RESTARTS   AGE
    javaee-cafe-cluster-agic-7bb57dd945-6ljll   1/1     Running   0          3m
    javaee-cafe-cluster-agic-7bb57dd945-h2xdf   1/1     Running   0          3m
    javaee-cafe-cluster-agic-7bb57dd945-k744w   1/1     Running   0          3m
    
  4. Switch to the browser tab of your Traffic Manager profile, then refresh the page until you see that Monitor status for endpoint myFailoverEndpoint is Online and Monitor status for endpoint myPrimaryEndpoint is Degraded.

  5. Switch to the browser tab with the DNS name of the Traffic Manager profile - for example, https://tmprofile-mjg032524.trafficmanager.net. Refresh the page and you should see the same data persisted in the application data table and the session table displayed. The UI that you see should be similar to the following screenshot:

    Screenshot of the sample application UI after failover.

    If you don't observe this behavior, it might be because the Traffic Manager is taking time to update DNS to point to the failover site. The problem could also be that your browser cached the DNS name resolution result that points to the failed site. Wait for a while and refresh the page again.

    Note

    The app configures session timeout as 1 hour. Depending on how much time it took to failover, you might not see session data displayed in the New coffee section of the sample app UI if it expired more than one hour previously.

Re-protect the failover site

Now that the secondary region is the failover site and is active, you should re-protect it with Azure Backup.

First, use the same steps in the section Back up the AKS cluster to back up the secondary AKS cluster, except for the following differences:

  1. For Create a Backup vault, use the following steps:
    1. For Resource group, select the existing resource group deployed in the secondary region - for example, liberty-aks-westus-mjg032524.
    2. For Backup vault name, enter a unique value - for example, aks-backup-vault-westus-mjg032524.
    3. For Region, select West US.
  2. For Create a backup policy, use the following steps:
    1. Select the Backup vault you created in the secondary region - for example, aks-backup-vault-westus-mjg032524.
  3. For Configure backups, use the following steps:
    1. Select the Backup vault you created in the secondary region - for example, aks-backup-vault-westus-mjg032524.
    2. For Backup Instance name, fill in a unique name - for example, akswestusmjg032524.

Then, use the same steps in the section Wait for a Vault-standard backup to happen to wait until a Vault-standard backup of the secondary AKS cluster is available, except select the backup instance from the Backup page of the secondary AKS cluster.

Fail back to the primary site

Use the same steps in the Failover to the secondary site section to fail back to the primary site including database server and the AKS cluster, except for the following differences:

  1. When you prepare for the failback, use the following steps:

    1. Stop the secondary AKS cluster to simulate an outage of the secondary region.
    2. Start the primary AKS cluster.
    3. Connect to the primary AKS cluster and clean up WebSphere Liberty/Open Liberty applications.
  2. When you restore the backup of the secondary AKS cluster to the primary AKS cluster, use the following steps:

    1. Select the backup instance in the secondary region - for example, <aks-cluster-name>\akswestusmjg032524.
    2. In the Restore parameters pane, use the following steps:
      1. For Select Target cluster, select the primary AKS cluster that you created in the East US region.
      2. For Backup Staging Location, select the Storage Account that you created in the East US region.
  3. When you verify that the restore works as expected, use the following steps:

    1. Switch to the terminal where you connected to the primary AKS cluster, and check that the app is restored successfully.
    2. Switch to the browser tab of your Traffic Manager profile, then refresh the page until you see that Monitor status for endpoint myPrimaryEndpoint is Online and Monitor status for endpoint myFailoverEndpoint is Degraded.

Clean up resources

If you're not going to continue to use the WebSphere Liberty/Open Liberty clusters and other components, use the following steps to delete the resource groups to clean up the resources used in this tutorial:

  1. In the Azure portal, in the search box, enter the resource group name of Azure SQL Database servers - for example, myResourceGroup - and select the matched resource group from the search results.
  2. Select Delete resource group.
  3. In Enter resource group name to confirm deletion, enter the resource group name.
  4. Select Delete.
  5. Repeat steps 1-4 for the resource group of the Traffic Manager - for example, myResourceGroupTM1.
  6. In the Azure portal, in the search box, enter Backup vaults and select Backup vaults from the search results. You should see two Backup vaults listed - for example, aks-backup-vault-eastus-mjg032524 and aks-backup-vault-westus-mjg032524. For each of them, use the following steps:
    1. Select to open the Backup vault.
    2. Select Manage > Properties > Soft delete > Update. Next to Enable soft Delete, unselect the checkbox, and then select Update.
    3. Select Manage > Backup instances. Filter on the datasource type Kubernetes Services. Select the instance you created and then delete it.
  7. Wait until the two Backup instances are deleted.
  8. Repeat steps 1-4 for the resource group of the primary cluster - for example, liberty-aks-eastus-mjg032524.
  9. Repeat steps 1-4 for the resource group of the secondary cluster - for example, liberty-aks-westus-mjg032524.

Next steps

In this tutorial, you set up a WebSphere Liberty/Open Liberty HA/DR solution consisting of an active-passive application infrastructure tier with an active-passive database tier, and in which both tiers span two geographically different sites. At the first site, both the application infrastructure tier and the database tier are active. At the second site, the secondary domain is restored with Azure Backup, and the secondary database is on standby.

Continue to explore the following references for more options to build HA/DR solutions and run WebSphere on Azure: