Load Balancer GetVirtualMachineScaleSet looking for non-existent VM
Background
We wanted to change some of the types of VMs for our Kubernetes cluster to more resource-optimized ones.
To this end, we created two new NodePools with the new VMs and set the appropriate taints. We then added a new taint to the old Nodes and drained them.
All pods were moved over to the new Nodes as expected. We deleted the old NodePools.
Issue
After the above operation was completed, we experienced issues with ingress. It turned out the Backend Nodepool for the Load Balancer was one of the two that was drained and removed. After setting the Backend Nodepool to the newly created ScaleSet, things mostly started working again.
However, both current Load Balancers (one for the application, one for a Kibana dashboard) are still showing warnings.
The Load Balancer for the application has not been re-created during the switch and kept its IP. It seems to be working after the Backend Nodepool was reset (URL is reachable), but shows the warnings below:
Warning GetVirtualMachineScaleSet 60s (x654 over 18h) azure-cloud-provider compute.VirtualMachineScaleSetsClient#Get: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceNotFound" Message="The Resource 'Microsoft.Compute/virtualMachineScaleSets/aks-acceptance-29012786-vmss' under resource group '{masked-for-privacy}' was not found."
Warning LoadBalancerUpdateFailed 60s (x644 over 18h) service-controller Error updating load balancer with new hosts map[aks-acc-29012786-vmss000000:{} aks-datastorage-29012786-vmss000000:{} aks-datastorage-29012786-vmss000001:{} aks-devtest-29012786-vmss000000:{} aks-elastic-29012786-vmss000000:{} aks-elastic-29012786-vmss000001:{} aks-test-29012786-vmss000002:{}]: timed out waiting for the condition
The Load Balancer for the Kibana dashboard has been deleted/recreated after the switch in hopes that this would resolve the warnings. It still shows the warnings below and also does not receive a new external IP address. The status remains Pending.
Normal EnsuringLoadBalancer 2m6s (x170 over 14h) service-controller Ensuring load balancer
Warning GetVirtualMachineScaleSet 2m6s (x170 over 14h) azure-cloud-provider compute.VirtualMachineScaleSetsClient#Get: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceNotFound" Message="The Resource 'Microsoft.Compute/virtualMachineScaleSets/aks-acceptance-29012786-vmss' under resource group '{masked-for-privacy}' was not found."
Azure logs seem to suggest it keeps trying to recreate both Load Balancers once every few minutes.
The 'aks-acceptance-29012786-vmss' they're both looking for is one of the two old Nodepools, and no longer exists.
I've been trying to pin down why the Load Balancers keep referring to a non-existent ScaleSet, but so far without luck.
I'm also uncertain why the application Load Balancer cannot be updated with a new hosts map.
Does anyone have any inkling what's going on, or what further steps I can take to debug the issue?
Kind regards,
Chris