Are NVIDIA driver updates needed on AKS or ML VMs?

Andrew Savory 0 Reputation points
2024-04-12T08:34:04.7633333+00:00

Hi there,

We received this message from Azure about applying updates to some GPU virtual machines. However the only VMs of this type we are using are either part of an AKS node pool, or are used as compute for a ML workspace endpoint. I'm fairly certain we shouldn't be trying to connect to these machines directly to apply updates as they are supposed to be managed. Is this correct?

Update NVIDIA driver on NVads A10_v5 virtual machines by 31 May 2024

You're receiving this email because you currently use NVads A10 v5 series virtual machnes.

The latest vGPU 17.x driver from NVIDIA is backward compatible only with vGPU 16.x.

To avoid disruptions to your service when we roll out vGPU 17.x, please update the NVIDIA driver in your NVads A10 v5 virtual machines to vGPU 16.x by 31 May 2024.

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,565 questions
Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
7,137 questions
Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
1,856 questions
{count} votes

1 answer

Sort by: Most helpful
  1. romungi-MSFT 42,191 Reputation points Microsoft Employee
    2024-04-12T10:37:13.6566667+00:00

    @Andrew Savory If you are using an online managed endpoint then this should be taken care of by the service, if you are using an AKS online endpoint then node maintenance would be users responsibility. See the differences between online endpoints and AKS endpoints.

    For upgrading the drivers, please see this page.

    0 comments No comments