Impossible to activate HugePage on AKS nodes

Gregory Esnaud 96 Reputation points
2021-01-20T09:32:04.537+00:00

Hi dear Azure community,

I'm struggling in HugePage activation on a AKS cluster.

  1. I noticed that I first have to configure a nodepool with HugePage support.
    The only official Azure Hugepage doc is about transparentHugePage (https://learn.microsoft.com/en-us/azure/aks/custom-node-configuration), but I don't know if it's sufficient...
  2. Then I know that I have to configure pod also
    I wanted to rely on this (https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/), but as 2) not working...

But in despite of whole things i've done, I could not make it.

If I'm following Microsoft documentation, my nodepool spawn like this:

"kubeletConfig": {
"allowedUnsafeSysctls": null,
"cpuCfsQuota": null,
"cpuCfsQuotaPeriod": null,
"cpuManagerPolicy": null,
"failSwapOn": false,
"imageGcHighThreshold": null,
"imageGcLowThreshold": null,
"topologyManagerPolicy": null
},
"linuxOsConfig": {
"swapFileSizeMb": null,
"sysctls": {
"fsAioMaxNr": null,
"fsFileMax": null,
"fsInotifyMaxUserWatches": null,
"fsNrOpen": null,
"kernelThreadsMax": null,
"netCoreNetdevMaxBacklog": null,
"netCoreOptmemMax": null,
"netCoreRmemMax": null,
"netCoreSomaxconn": null,
"netCoreWmemMax": null,
"netIpv4IpLocalPortRange": "32000 60000",
"netIpv4NeighDefaultGcThresh1": null,
"netIpv4NeighDefaultGcThresh2": null,
"netIpv4NeighDefaultGcThresh3": null,
"netIpv4TcpFinTimeout": null,
"netIpv4TcpKeepaliveProbes": null,
"netIpv4TcpKeepaliveTime": null,
"netIpv4TcpMaxSynBacklog": null,
"netIpv4TcpMaxTwBuckets": null,
"netIpv4TcpRmem": null,
"netIpv4TcpTwReuse": null,
"netIpv4TcpWmem": null,
"netIpv4TcpkeepaliveIntvl": null,
"netNetfilterNfConntrackBuckets": null,
"netNetfilterNfConntrackMax": null,
"vmMaxMapCount": null,
"vmSwappiness": null,
"vmVfsCachePressure": null
},
"transparentHugePageDefrag": "defer+madvise",
"transparentHugePageEnabled": "madvise"

But My node is still like that:

   # kubectl describe nodes aks-deadpoolhp-31863567-vmss000000|grep hugepage  
   Capacity:  
     attachable-volumes-azure-disk:  16  
     cpu:                            8  
     ephemeral-storage:              129901008Ki  
     hugepages-1Gi:                  0  
     hugepages-2Mi:                  0  
     memory:                         32940620Ki  
     pods:                           110  
   Allocatable:  
     attachable-volumes-azure-disk:  16  
     cpu:                            7820m  
     ephemeral-storage:              119716768775  
     hugepages-1Gi:                  0  
     hugepages-2Mi:                  0  
     memory:                         28440140Ki  
     pods:                           110  

My kube version is
1.16.15

I saw also that I should enable featuregate like this
--feature-gates=HugePages=true
(https://dev.to/dannypsnl/hugepages-on-kubernetes-5e7p) but I don't know how to make that in AKS... anyway As my node is not displaying any HugePage availability, i'm not sure it's useful for now.

I even try to recreate the aks cluster with a
--kubeconfig
, but everything remain the same: i cannot use HugePage...

Please I need your help again, i'm completely lost into this AKS service...

Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,187 questions
{count} votes

Accepted answer
  1. Gregory Esnaud 96 Reputation points
    2021-01-22T14:35:45.107+00:00
    • Install kubectl-node-shell on your laptop curl -LO https://github.com/kvaps/kubectl-node-shell/raw/master/kubectl-node_shell
      chmod +x ./kubectl-node_shell
      sudo mv ./kubectl-node_shell /usr/local/bin/kubectl-node_shell
    • Get the nodes you want to get inside: kubectl get pod <YOUR_POD> -o custom-columns=CONTAINER:.spec.nodeName -n <YOUR_NAMESPACE>
    • If node is NONE, that means your pod is in pending state. Pick up one random node: kubectl get pod -n <YOUR_NAMESPACE>
    • Get inside your node: kubectl node-shell <NODE>
    • Configure Hugepage: mkdir -p /mnt/huge
      mount -t hugetlbfs nodev /mnt/huge
      echo 1024 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
      cat /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
    • Restart kubelet (still in the node, yes): systemctl restart kubelet
    • Exit from node-shell by C-d (Ctrl + d)
    • Check HugePage is ON (ie. Values must not be 0) kubectl describe node <NODE>|grep -i -e "capacity" -e "allocatable" -e "huge"
    • Either check you pod not in pending state anymore, or launch your helm install/kubectl apply now!
    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. shiva patpi 13,256 Reputation points Microsoft Employee
    2021-01-22T00:17:47.45+00:00

    Hello @Gregory Esnaud ,
    I just tried all those steps which you have shared in the document and I am able to successfully implement the huge pages on kubernetes cluster. I tried both older version 1.16.15 and latest version.
    Finally I can see the required outputs as below:
    For VMSS node output as below:

    kubectl describe node aks-mynodepool1-33705488-vmss000000

    Addresses:
    Hostname: aks-mynodepool1-33705488-vmss000000
    InternalIP: 10.240.0.7
    Capacity:
    attachable-volumes-azure-disk: 8
    cpu: 2
    ephemeral-storage: 129900528Ki
    hugepages-1Gi: 0
    hugepages-2Mi: 2Gi
    memory: 7121292Ki
    pods: 110
    Allocatable:
    attachable-volumes-azure-disk: 8
    cpu: 1900m
    ephemeral-storage: 119716326407
    hugepages-1Gi: 0
    hugepages-2Mi: 2Gi
    memory: 2578828Ki

    I deployed the same pod mentioned in the document https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/
    But i did use medium: Hugepages (Second Yaml)

    Output of the describe pod:
    kubectl describe pod huge-pages-example
    State: Running
    Started: Thu, 21 Jan 2021 16:06:05 -0800
    Ready: True
    Restart Count: 0
    Limits:
    hugepages-2Mi: 100Mi
    memory: 100Mi
    Requests:
    hugepages-2Mi: 100Mi
    memory: 100Mi
    Environment: <none>
    Mounts:
    /hugepages from hugepage (rw)
    /var/run/secrets/kubernetes.io/serviceaccount from default-token-bqz8v (ro)
    Conditions:
    Type Status
    Initialized True
    Ready True
    ContainersReady True
    PodScheduled True
    Volumes:
    hugepage:
    Type: EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium: HugePages

    Here are the detailed steps which i followed:

    1) As is , the steps mentioned in the document:-
    https://learn.microsoft.com/en-us/azure/aks/custom-node-configuration
    2) SSH to one of the node , ran the below commands:

    2.1) azureuser@aks-mynodepool1-33705488-vmss000000:~$ sudo mkdir -p /mnt/huge
    2.2) azureuser@aks-mynodepool1-33705488-vmss000000:~$ sudo mount -t hugetlbfs nodev /mnt/huge
    2.3) root@aks-mynodepool1-33705488-vmss000000:/home/azureuser# echo 1024 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
    2.4) root@aks-mynodepool1-33705488-vmss000000:/home/azureuser# cat /proc/meminfo | grep Huge
    root@aks-mynodepool1-33705488-vmss000000:/home/azureuser# cat /proc/meminfo | grep Huge
    AnonHugePages: 67584 kB
    ShmemHugePages: 0 kB
    FileHugePages: 0 kB
    HugePages_Total: 1024
    HugePages_Free: 1024
    HugePages_Rsvd: 0
    HugePages_Surp: 0
    Hugepagesize: 2048 kB
    Hugetlb: 2097152 kB

    went back to the node kubectl describe node aks-mynodepool1-33705488-vmss000000

    Capacity:
    attachable-volumes-azure-disk: 8
    cpu: 2
    ephemeral-storage: 129900528Ki
    hugepages-1Gi: 0
    hugepages-2Mi: 2Gi
    memory: 7121292Ki
    pods: 110
    Allocatable:
    attachable-volumes-azure-disk: 8
    cpu: 1900m
    ephemeral-storage: 119716326407
    hugepages-1Gi: 0
    hugepages-2Mi: 2Gi
    memory: 2578828Ki

    Kindly let me know if you need additional help , just try to follow the first document - you should be good.

    0 comments No comments

  2. Gregory Esnaud 96 Reputation points
    2021-01-22T08:27:13.283+00:00

    Hi @shiva patpi ,

    In the meantime i opened a Azure support ticket.

    They roughly tell me same as you:

     Spawn a node  
     Ssh in it  
     Activate huge  
     Restart kubelet  
    

    My question was more about activate it directly bu a kubeletconfig.json with az aks nodepool create --kubeletconfig or az aks create --kubeletconfig.
    The answer is that this a kind of feature that still in preview.

    But that's ok for me, I know now how to get inside a node (I will write the procedure from the Microsoft support, by the way very reactive 👍)


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.