Node count in AKS increased without approval/manual operation after SP renewal

Anonymous
2025-02-27T12:53:40.7033333+00:00

After SP renewal, node scaled up in agent pools even when the condition for scaling was manual

Azure Kubernetes Service
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,448 questions
{count} votes

4 answers

Sort by: Most helpful
  1. Anonymous
    2025-02-27T15:07:37.1866667+00:00

    Hi Sajin Seby,

    Welcome to the Microsoft Q&A Platform! Thank you for asking your question here.

    We understand from your query that you are experiencing an issue with node count increases when you renewed SP.

    It appears that the growth in node numbers may be because of automated functionality within AKS, such as the Cluster Auto scaler, which may have been activated following the SP renewal. When the Service Principal was renewed, it may have reinstated permissions or activated a policy to enable autoscaling to take place, particularly if there were resource requirements.

    It's also conceivable that certain automated tasks or scripts involving the SP renewal inadvertently caused scaling activities. To avoid this in the future, I'd recommend verifying the autoscaling configuration and checking any automation associated with the SP to make sure everything is set up as desired.

    If the answer was helpful, please don't forget to upvote and accept answer the answer

    Thank you.

    0 comments No comments

  2. Arko 4,150 Reputation points Microsoft External Staff Moderator
    2025-03-25T14:49:56.3566667+00:00

    Hi Sajin Seby,

    Thank you for sharing the details regarding the unexpected node count increase in your AKS cluster. After a thorough investigation and discussions with the Microsoft product team, here are the key findings from our internal product team

    Your AKS cluster is currently running Kubernetes version 1.28.5, which reached end-of-life in January 2025. As per Microsoft's AKS support policy, clusters running unsupported versions do not receive runtime guarantees, and unexpected behavior may occur. Would recommend you upgrade your cluster to at least Kubernetes 1.30 to ensure continued support and stability.

    No forced scale-up events were detected in the timeframe of interest (27th February 2025, between 10 AM – 12 PM IST). The kubectl get events and az vmss list-instances commands did not reveal any unexpected scaling activities.

    Most importantly, Node Pool always had a count of 1.

    From our internal logs for your agent pool snapshot data, it was confirmed that the discussed node pool had a consistent node count of 1 since 8th January 2025. There was no sudden scale-up on 27th February 2025, as the node count was already set to 1 before this date.

    As discussed in private message, your cluster is managed via Terraform, and while no Terraform apply operation was explicitly identified during the timeframe, it is possible that a previous Terraform state enforcement may have contributed to the node count being maintained at 1.

    Please be assured, there was no sudden scale-up event detected.


  3. LISBOA-4826 245 Reputation points Volunteer Moderator
    2025-04-26T12:48:02.02+00:00

    Hi Sajin Seby

    I hope you are doing well.

    Could you please let us know if this issue still happening?

    Also, after renewing a service principal (SP) in Azure, it's possible for nodes in your Azure Kubernetes Service (AKS) cluster to scale up, even if you're using a manual scaling configuration. This can happen because the SP renewal might inadvertently trigger a change in the cluster's authentication or authorization settings, leading to unexpected scaling behavior.

    Here is the reference for that:https://learn.microsoft.com/en-us/azure/aks/update-credentials#reset-the-existing-service-principal-credentials

    Troubleshooting:

    If you've experienced this issue, you should:

    Examine the scaling history: Check the Azure portal or the AKS cluster logs to see if there's any indication of when the node scaling occurred and what might have triggered it.

    Review Cluster Autoscaler settings.

    Verify SP Permissions.

    Consider Managed Identities: Instead of relying on SP's, consider using Managed Identities for authentication, as they offer more automated management of credentials.

    Consult Microsoft Support: If you're still struggling to resolve the issue, you may need to file a support ticket with Microsoft for deeper investigation.

    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    If you have any other questions or are still running into more issues, let me know in the "comments" and I would be happy to help you.

    Thank You.

    Lisboa

    0 comments No comments

  4. LISBOA-4826 245 Reputation points Volunteer Moderator
    2025-05-31T11:08:48.76+00:00

    Hi Sajin Seby,

    Can you please provide a feedback on our last answer? It's by design trigged the recreation of instances inside the nodepool on AKS when PS is renewed.

    Thank You.

    Lisboa

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.