Aks cluster k8s version 1.23.5 is locked in a failed state and cannot be restarted, accessed or upgraded

Michael Schmidt Nissen 20 Reputation points
2023-01-19T09:31:09.3133333+00:00

Hi,

We have an AKS cluster with Kubernetes version 1.23.5 installed, which is running in a failed state, most probably because this version is recently no longer supported.

When trying to restart the cluster (stop, wait 30 minutes, then start), we get the following error message:

(ReconcileStandardLoadBalancerError) Reconcile standard load balancer failed. Details: outboundReconciler retry failed: Category: ClientError; Code: Unspecified; SubCode: InvalidRequestFormat_DuplicateResourceName; Message: ; InnerMessage: ; Dependency: Microsoft.Network/LoadBalancers; AKSTeam: Networking; OriginalError: Code="InvalidRequestFormat" Message="Cannot parse the request." Details=[{"code":"DuplicateResourceName","message":"Resource /subscriptions//resourceGroups//providers/Microsoft.Network/loadBalancers/ has two child resources with the same name (28548feb-70a2-4779-8711-89becdd51691)."}]; Retriable: false. Code: ReconcileStandardLoadBalancerError Message: Reconcile standard load balancer failed. Details: outboundReconciler retry failed: Category: ClientError; Code: Unspecified; SubCode: InvalidRequestFormat_DuplicateResourceName; Message: ; InnerMessage: ; Dependency: Microsoft.Network/LoadBalancers; AKSTeam: Networking; OriginalError: Code="InvalidRequestFormat" Message="Cannot parse the request." Details=[{"code":"DuplicateResourceName","message":"Resource /subscriptions//resourceGroups//providers/Microsoft.Network/loadBalancers/ has two child resources with the same name (28548feb-70a2-4779-8711-89becdd51691)."}]; Retriable: false.

When trying to upgrade the cluster to a higher, supported version, we get the same error. When trying to upgrade to same version to pull the cluster out of its failed state, we get the same error.

How can we get the cluster going again? Deleting and recreating the cluster is not an option.

Cheers,
Michael Schmidt Nissen
Back end developer @ ReMoni

Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,225 questions
{count} votes

4 answers

Sort by: Most helpful
  1. Cristian Gatjens 716 Reputation points Microsoft Employee
    2023-01-19T13:47:52.1266667+00:00

    Hello Michael,

    Thank you for reaching out & I hope you are doing well.

    Based on the information that you have provided; you are attempting to restart your AKS cluster but failing with that error message. From that error message, I am concerned about this specific part:

    /subscriptions//resourceGroups//providers/Microsoft.Network/loadBalancers/ has two child resources with the same name (28548feb-70a2-4779-8711-89becdd51691)."}]

    I suspect that this 28548feb-70a2-4779-8711-89becdd51691 is pointing to the name of a public IP address of your Load Balancer under Frontend IP Configuration, e.g.:

    User's image

    Doing my research, there seems to be a very similar error message described in the following external StackOverflow link:

    [https://stackoverflow.com/questions/69559293/azure-kubernetes-service-aks-no-longer-able-to-create-new-nodepools

    Can you please check if you see a duplicate name or IP in that section? Your Load balancer resource should be under your MC_Resource Group.

    From the StackOverflow link, a workaround is removing the IP configuration to resolve that duplicate conflict.

    Please "Accept the answer" if the information helped you. This will help us and others in the community as well. Feel free to reply with any other questions or concerns.

    Hope this helps!


  2. Eddie Neto 1,231 Reputation points Microsoft Employee
    2023-01-20T09:34:09.05+00:00

    Hi @Michael Schmidt Nissen

    Thanks for reaching Microsoft Q&A

    This issue is very common once you have changed the Public IP of the SLB.

    From the error message we can learned that it seems have name conflict of your IP here: Microsoft.Network/LoadBalancers; Details=[{"code":"DuplicateResourceName","message": has two child resources with the same name (28548feb-70a2-4779-8711-89becdd51691)."}]

    Please revert the public IP address in SLB back to the right IP to resolve the conflict, then retried the reconcile of the cluster by running "az resource update --ids /subscriptions/<your subscription>/resourcegroups/<resource group of the AKS>/providers/Microsoft.ContainerService/managedClusters/<name of the AKS>"

    • Check as well if you have some "RequestDisallowedByPolicy" that is blocking you.
    • Also check if there is any other IP selected here inside of Outbound Rules.
      User's image

    User's image

    Hope this helps. Please "Accept as Answer" if it helped, so that it can help others in the community looking for help on similar topics.

    0 comments No comments

  3. AA_Marco Crank 0 Reputation points
    2023-02-15T03:20:20.0133333+00:00

    Hello,

    Sorry but I wanted to jump in here because we are having the exact same issue. @Michael Schmidt Nissen were you able to resolve this or any of the Microsoft employees on this thread have any other suggestions on how to resolve this? It is calling out our front end ip configuration for the load balancer but we only have one listed.

    I am attempting to build a new node pool to change sizes on our worker nodes. In order to do that I had to upgrade the K8s version. I checked the option for just the control plane, which seems to have worked as it is reporting the latest version, but somewhere in the it threw an error as described above about the duplicate resource (FE IP config) even though we only have one showing in the portal. Now the cluster is in a failed state and I am not able to add a new node pool. Any insight would be great.

    Much appreciated!


  4. Alexandru Pirvu 0 Reputation points
    2024-02-29T13:59:36.84+00:00

    I had the same problem. In my case, it turns out that I had a single Frontend IP configuration containing 3 rules mapped to it (2 x Load balancing rules for TCP ports 80 and 533 + 1 x Outbound rule).

    I saw that the assigned IP address was the one used for inbound and load balancing rules so I attempted to create a new Frontend IP configuration with the name set the same as the outbound rule public IP address ID. I then, got exactly the same error as described in this issue and that the name is already used - perhaps the reconciliation process was attempting to do exactly the same thing I was doing manually.

    Going right to the point, in my case, it worked by making sure that:

    • there was one Frontend IP configuration named based on the "Load balancing rules", have the rules assigned to it and the load balanced IP addressUser's image
    • there was one Frontend IP configuration named based on the outbound IP address ID and have the aksOutboundRule assigned to it
      User's image

    Note sure if this thing with two separte IP address (1 for inbound and 1 for outbound) is a regular practice but this is how it was setup on a similar environment on my side. I hope this helps others, if not this particular configuration, at least this area of configuration.

    Afterwards, I run the reconciliation command @Eddie Neto sugested in this thread and all well.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.