Load Balancer GetVirtualMachineScaleSet looking for non-existent VM

Christiaan Westgeest 6 Reputation points
2020-01-16T10:32:52.93+00:00

Background

We wanted to change some of the types of VMs for our Kubernetes cluster to more resource-optimized ones.
To this end, we created two new NodePools with the new VMs and set the appropriate taints. We then added a new taint to the old Nodes and drained them.
All pods were moved over to the new Nodes as expected. We deleted the old NodePools.

Issue

After the above operation was completed, we experienced issues with ingress. It turned out the Backend Nodepool for the Load Balancer was one of the two that was drained and removed. After setting the Backend Nodepool to the newly created ScaleSet, things mostly started working again.

However, both current Load Balancers (one for the application, one for a Kibana dashboard) are still showing warnings.

The Load Balancer for the application has not been re-created during the switch and kept its IP. It seems to be working after the Backend Nodepool was reset (URL is reachable), but shows the warnings below:

  Warning  GetVirtualMachineScaleSet  60s (x654 over 18h)  azure-cloud-provider  compute.VirtualMachineScaleSetsClient#Get: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceNotFound" Message="The Resource 'Microsoft.Compute/virtualMachineScaleSets/aks-acceptance-29012786-vmss' under resource group '{masked-for-privacy}' was not found."
  Warning  LoadBalancerUpdateFailed   60s (x644 over 18h)  service-controller    Error updating load balancer with new hosts map[aks-acc-29012786-vmss000000:{} aks-datastorage-29012786-vmss000000:{} aks-datastorage-29012786-vmss000001:{} aks-devtest-29012786-vmss000000:{} aks-elastic-29012786-vmss000000:{} aks-elastic-29012786-vmss000001:{} aks-test-29012786-vmss000002:{}]: timed out waiting for the condition

The Load Balancer for the Kibana dashboard has been deleted/recreated after the switch in hopes that this would resolve the warnings. It still shows the warnings below and also does not receive a new external IP address. The status remains Pending.

  Normal   EnsuringLoadBalancer       2m6s (x170 over 14h)  service-controller    Ensuring load balancer
  Warning  GetVirtualMachineScaleSet  2m6s (x170 over 14h)  azure-cloud-provider  compute.VirtualMachineScaleSetsClient#Get: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceNotFound" Message="The Resource 'Microsoft.Compute/virtualMachineScaleSets/aks-acceptance-29012786-vmss' under resource group '{masked-for-privacy}' was not found."

Azure logs seem to suggest it keeps trying to recreate both Load Balancers once every few minutes.

The 'aks-acceptance-29012786-vmss' they're both looking for is one of the two old Nodepools, and no longer exists.

I've been trying to pin down why the Load Balancers keep referring to a non-existent ScaleSet, but so far without luck.
I'm also uncertain why the application Load Balancer cannot be updated with a new hosts map.

Does anyone have any inkling what's going on, or what further steps I can take to debug the issue?

Kind regards,
Chris

Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
7,102 questions
{count} vote