AKS 1.22.11 - node(s) had volume node affinity conflict

Zoheb 6 Reputation points
2022-08-24T15:26:47.833+00:00

Use Case: Backup Aks PVC(using velero) which is running on Azure AKS 1.22.11 with no AZs.

Restore it on AKS 1.22.11 which is running with multi AZs.

I'm able to restore pvc,pv, svc using velero. However pods are not coming up with this error "node(s) had volume node affinity conflict" & "pod didn't trigger scale-up:"

Appreciate your suggestions?

Is it because of the pvc which i backed up is hosted in AKS with no AZs?

If I describe this PV in source cluster, I won't be getting zone added to it and it's expected because of no AZs.

Even after restore, the PV state is same.

Azure Kubernetes Service
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,456 questions
Azure Disk Storage
Azure Disk Storage
A high-performance, durable block storage designed to be used with Azure Virtual Machines and Azure VMware Solution.
668 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. shiva patpi 13,366 Reputation points Microsoft Employee Moderator
    2022-08-25T05:23:03.037+00:00

    Hello @Zoheb ,
    This happens because the PVC which was previously attached to a Non-zonal Node - now is trying to attach to a Node which is in ZONAL - which might not be possible.
    For the above error signature , there is a detailed troubleshooting and mitigation steps mentioned at https://stackoverflow.com/questions/51946393/kubernetes-pod-warning-1-nodes-had-volume-node-affinity-conflict#:~:text=The%20error%20%22volume%20node%20affinity%20conflict%22%20happens%20when,see%20the%20details%20of%20all%20the%20Persistent%20Volumes.

    Probably try creating a new nodepool with out zones.

    Kindly take a look at those discussions and let us know if that helps !!

    Regards,
    Shiva.

    2 people found this answer helpful.
    0 comments No comments

  2. Zoheb 6 Reputation points
    2022-08-25T09:45:39.643+00:00

    @shiva patpi - Thanks for your response and for sharing appropriate links.

    In my use case, I figured it out and the issue was provisioner "kubernetes.io/azure-disk". Velero backup/restore is not working if the storage class provisioner is "kubernetes.io/azure-disk". I deployed the same app with storage class provisioner " disk.csi.azure.com" and velero restoration worked in the Multi-Zone cluster.

    On High-Level:

    If the storage class provisioner is "disk.csi.azure.com", then backed up PVC with No AZ cluster and restoration of that PVC on Multi-AZ cluster with ZRS worked for me.

    If the storage class provisioner is "kubernetes.io/azure-disk", then backed up PVC with No AZ cluster and restoration of that PVC on Multi-AZ cluster with ZRS didn't work. As I believe, CSI driver is meant for this operation.

    Now the major concern is, that Microsoft's recommendation is to migrate in-tree storage classes(kubernetes.io/azure-disk) to CSI.
    https://learn.microsoft.com/en-us/azure/aks/csi-storage-drivers

    Migrating these storage classes involves deleting the existing ones, and re-creating them with the provisioner set to disk.csi.azure.com if using Azure Disks, and files.csi.azure.com if using Azure Files.

    Already my PVCs are holding data if I delete and recreate a new storage class and point a new storage class in the running app. It will create a new/empty PVC with this provisioner "disk.csi.azure.com". But my application needs old PVC data, not new PVC. How can we mitigate this? Is there a workaround?

    Appreciate suggestions/Inputs.

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.