Hello @Patrick Deubel
Thank you for reaching out to the Microsoft Q&A platform.
The reason why the "Standard_NV12ads_A10_v5" node pool in your AKS cluster is using CUDA v11.6 is because it is based on the NVIDIA Tesla V100 GPU, which supports CUDA 11.6. The A10 GPU is a slice of the V100 GPU and has a smaller number of CUDA cores, which is why it is using the same version of CUDA.
To use CUDA v12, you will need to use a GPU that supports it, such as the NVIDIA A100 GPU. You can create a new node pool in your AKS cluster that uses the A100 GPU and deploy your application to that node pool. You can also use a custom image that has CUDA v12 installed on it.
To automate the autoscaling of the node pool, you can use the Kubernetes Horizontal Pod Autoscaler (HPA) to automatically scale the number of pods based on CPU or memory utilization. You can also use the Kubernetes Cluster Autoscaler (CA) to automatically scale the number of nodes in the node pool based on the demand for resources. The HPA and CA can work together to ensure that your application has the resources it needs to run efficiently.
Please click "Accept as answer" and do a Thumbs-Up if this helps