Aks cluster k8s version 1.23.5 is locked in a failed state and cannot be restarted, accessed or upgraded

Michael Schmidt Nissen 20 Reputation points
2023-01-19T09:31:09.3133333+00:00

Hi,

We have an AKS cluster with Kubernetes version 1.23.5 installed, which is running in a failed state, most probably because this version is recently no longer supported.

When trying to restart the cluster (stop, wait 30 minutes, then start), we get the following error message:

(ReconcileStandardLoadBalancerError) Reconcile standard load balancer failed. Details: outboundReconciler retry failed: Category: ClientError; Code: Unspecified; SubCode: InvalidRequestFormat_DuplicateResourceName; Message: ; InnerMessage: ; Dependency: Microsoft.Network/LoadBalancers; AKSTeam: Networking; OriginalError: Code="InvalidRequestFormat" Message="Cannot parse the request." Details=[{"code":"DuplicateResourceName","message":"Resource /subscriptions//resourceGroups//providers/Microsoft.Network/loadBalancers/ has two child resources with the same name (28548feb-70a2-4779-8711-89becdd51691)."}]; Retriable: false. Code: ReconcileStandardLoadBalancerError Message: Reconcile standard load balancer failed. Details: outboundReconciler retry failed: Category: ClientError; Code: Unspecified; SubCode: InvalidRequestFormat_DuplicateResourceName; Message: ; InnerMessage: ; Dependency: Microsoft.Network/LoadBalancers; AKSTeam: Networking; OriginalError: Code="InvalidRequestFormat" Message="Cannot parse the request." Details=[{"code":"DuplicateResourceName","message":"Resource /subscriptions//resourceGroups//providers/Microsoft.Network/loadBalancers/ has two child resources with the same name (28548feb-70a2-4779-8711-89becdd51691)."}]; Retriable: false.

When trying to upgrade the cluster to a higher, supported version, we get the same error. When trying to upgrade to same version to pull the cluster out of its failed state, we get the same error.

How can we get the cluster going again? Deleting and recreating the cluster is not an option.

Cheers,
Michael Schmidt Nissen
Back end developer @ ReMoni

Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
1,786 questions
{count} votes

3 answers

Sort by: Most helpful
  1. Cristian Gatjens 716 Reputation points Microsoft Employee
    2023-01-19T13:47:52.1266667+00:00

    Hello Michael,

    Thank you for reaching out & I hope you are doing well.

    Based on the information that you have provided; you are attempting to restart your AKS cluster but failing with that error message. From that error message, I am concerned about this specific part:

    /subscriptions//resourceGroups//providers/Microsoft.Network/loadBalancers/ has two child resources with the same name (28548feb-70a2-4779-8711-89becdd51691)."}]

    I suspect that this 28548feb-70a2-4779-8711-89becdd51691 is pointing to the name of a public IP address of your Load Balancer under Frontend IP Configuration, e.g.:

    User's image

    Doing my research, there seems to be a very similar error message described in the following external StackOverflow link:

    [https://stackoverflow.com/questions/69559293/azure-kubernetes-service-aks-no-longer-able-to-create-new-nodepools

    Can you please check if you see a duplicate name or IP in that section? Your Load balancer resource should be under your MC_Resource Group.

    From the StackOverflow link, a workaround is removing the IP configuration to resolve that duplicate conflict.

    Please "Accept the answer" if the information helped you. This will help us and others in the community as well. Feel free to reply with any other questions or concerns.

    Hope this helps!


  2. Eddie Neto 1,236 Reputation points Microsoft Employee
    2023-01-20T09:34:09.05+00:00

    Hi @Michael Schmidt Nissen

    Thanks for reaching Microsoft Q&A

    This issue is very common once you have changed the Public IP of the SLB.

    From the error message we can learned that it seems have name conflict of your IP here: Microsoft.Network/LoadBalancers; Details=[{"code":"DuplicateResourceName","message": has two child resources with the same name (28548feb-70a2-4779-8711-89becdd51691)."}]

    Please revert the public IP address in SLB back to the right IP to resolve the conflict, then retried the reconcile of the cluster by running "az resource update --ids /subscriptions/<your subscription>/resourcegroups/<resource group of the AKS>/providers/Microsoft.ContainerService/managedClusters/<name of the AKS>"

    • Check as well if you have some "RequestDisallowedByPolicy" that is blocking you.
    • Also check if there is any other IP selected here inside of Outbound Rules.
      User's image

    User's image

    Hope this helps. Please "Accept as Answer" if it helped, so that it can help others in the community looking for help on similar topics.

    0 comments No comments

  3. AA_Marco Crank 0 Reputation points
    2023-02-15T03:20:20.0133333+00:00

    Hello,

    Sorry but I wanted to jump in here because we are having the exact same issue. @Michael Schmidt Nissen were you able to resolve this or any of the Microsoft employees on this thread have any other suggestions on how to resolve this? It is calling out our front end ip configuration for the load balancer but we only have one listed.

    I am attempting to build a new node pool to change sizes on our worker nodes. In order to do that I had to upgrade the K8s version. I checked the option for just the control plane, which seems to have worked as it is reporting the latest version, but somewhere in the it threw an error as described above about the duplicate resource (FE IP config) even though we only have one showing in the portal. Now the cluster is in a failed state and I am not able to add a new node pool. Any insight would be great.

    Much appreciated!