Azure demands upgrade from working 1.24 to failing AKS cluster Kubernetes version 1.25

Ulf Brehmer 0 Reputation points
2023-06-02T11:47:45.4566667+00:00

I got this nag from Microsoft:

Action required: Upgrade your Azure Kubernetes Service cluster to a supported Kubernetes version. AKS is retiring v1.24.x on 30 July 2023. We’ve detected that one or more AKS clusters in your subscription(s) are using Kubernetes v1.24.x or lower. To stay within supported versions and service-level agreements (SLA) you have up to 30 days after the version is removed to upgrade

At the time of writing, the latest version (1.25.x) introduces a faulty nvidia GPU driver system that causes our OpenCL builds to fail. We have opened a ticket about this already, but still, Microsoft insists on the upgrade. This is not good, since we'd be forced to re-architect our GPU-dependent systems, and perhaps move away from Azure.

Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
1,992 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Cristian Gatjens 716 Reputation points Microsoft Employee
    2023-06-02T14:43:45.8966667+00:00

    Hello @Ulf Brehmer

    Thanks for reaching out and I hope you are doing well.

    I understand your concern about upgrading to 1.25.x because of the known NVIDIA GPU driver. However, those are notifications that we proactively send to customers mainly to avoid falling into unsupported versions and best-effort support.

    AKS provides regular support to AKS clusters running N-2 versions as you can see below:

    https://learn.microsoft.com/en-us/azure/aks/supported-kubernetes-versions?tabs=azure-cli#kubernetes-version-support-policy

    And when you are N-3 you have what we know as Platform Support:

    https://learn.microsoft.com/en-us/azure/aks/supported-kubernetes-versions?tabs=azure-cli#platform-support-policy

    The relevant section from above link is:

    Platform support policy applies to clusters in an n-3 version (where n is the latest supported AKS GA minor version), before the cluster drops to n-4. For example, kubernetes v1.25 will be considered platform support when v1.28 is the latest GA version. However, during the v1.29 GA release, v1.25 will then be auto-upgraded to v1.26.

    So, with your cluster running 1.24.x and the recent GA release of 1.27, you would be considered to have Platform support, but Microsoft will not upgrade your cluster on your behalf, we still recommend upgrading to the latest versions though.

    Do you know if the NVIDIA Driver issue exists on version 1.26.x? That version is GA since April 2023.

    Please "Accept the answer" if the information helped you. Feel free to reply with any other questions or concerns.

    Hope this helps!