在 Azure Stack Edge Pro GPU 裝置上採取 GPU 共用來部署 Kubernetes 工作負載

發行項
05/25/2024

本文說明容器化的工作負載如何在 Azure Stack Edge Pro GPU 裝置上共用 GPU。本文中需執行兩項作業：一項不具有 GPU 內容共用，另一項則是透過裝置上的多程序服務 (MPS) 來實現內容共用。如需詳細資訊，請參閱多程序服務。

必要條件

在您開始前，請確定：

您可以存取已啟用並擁有計算設定的 Azure Stack Edge Pro GPU 裝置。您擁有 Kubernetes API 端點，而且已將此端點新增至 hosts 用戶端上將會存取裝置的檔案。
您可以使用支援的作業系統來存取用戶端系統。如果使用 Windows 用戶端，系統應該執行 PowerShell 5.0 或更新版本來存取該裝置。
您已建立命名空間和使用者。您亦將此命名空間的存取權授與使用者。您已在用以存取裝置的用戶端系統上，安裝此命名空間的 kubeconfig 檔案。如需詳細指示，請參閱在您的 Azure Stack Edge Pro GPU 裝置上透過 kubectl 連線至 Kubernetes 叢集並加以管理。

在本機系統上 yaml 儲存下列部署。您將使用此檔案來執行 Kubernetes 部署。此部署是依據 Nvidia 公開提供的簡單 CUDA 容器為基礎。

apiVersion: batch/v1
kind: Job
metadata:
  name: cuda-sample1
spec:
  template:
    spec:
      hostPID: true
      hostIPC: true
      containers:
        - name: cuda-sample-container1
          image: nvidia/samples:nbody
          command: ["/tmp/nbody"]
          args: ["-benchmark", "-i=1000"]
          env:
          - name: NVIDIA_VISIBLE_DEVICES
            value: "0"
      restartPolicy: "Never"
  backoffLimit: 1
---

apiVersion: batch/v1
kind: Job
metadata:
  name: cuda-sample2
spec:
  template:
    metadata:
    spec:
      hostPID: true
      hostIPC: true
      containers:
        - name: cuda-sample-container2
          image: nvidia/samples:nbody
          command: ["/tmp/nbody"]
          args: ["-benchmark", "-i=1000"]
          env:
          - name: NVIDIA_VISIBLE_DEVICES
            value: "0"
      restartPolicy: "Never"
  backoffLimit: 1

驗證 GPU 驅動程式、CUDA 版本

首要步驟需要確認您的裝置正在執行必要的 GPU 驅動程式和 CUDA 版本。

連線到裝置的 PowerShell 介面。
執行以下命令：
```
Get-HcsGpuNvidiaSmi
```

在 Nvidia smi 輸出中，記下您裝置上的 GPU 版本和 CUDA 版本。如果執行的是 Azure Stack Edge 2102 軟體，此版本會對應至下列驅動程式版本：

GPU 驅動程式版本：460.32.03
CUDA 版本：11.2

範例輸出如下：

[10.100.10.10]: PS>Get-HcsGpuNvidiaSmi
K8S-1HXQG13CL-1HXQG13:

Wed Mar  3 12:24:27 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00002C74:00:00.0 Off |                    0 |
| N/A   34C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
[10.100.10.10]: PS>

請讓此工作階段保持開啟，因為需使用它來檢視整篇文章中的 Nvidia smi 輸出。

需執行的第一項作業，是在命名空間 mynamesp1 中在裝置上部署應用程式。此應用程式部署也會顯示出預設為不啟用 GPU 內容共用。

列出命名空間中執行的所有 Pod。執行以下命令：

kubectl get pods -n <Name of the namespace>

範例輸出如下：

PS C:\WINDOWS\system32> kubectl get pods -n mynamesp1
No resources found.

使用稍早提供的 deployment.yaml，在裝置上啟動部署作業。執行以下命令：

kubectl apply -f <Path to the deployment .yaml> -n <Name of the namespace>

此作業會建立兩個容器，並在這兩個容器上執行 n 體模擬。模擬反覆運算的數量會在 .yaml 中指定。

範例輸出如下：

PS C:\WINDOWS\system32> kubectl apply -f -n mynamesp1 C:\gpu-sharing\k8-gpusharing.yaml
job.batch/cuda-sample1 created
job.batch/cuda-sample2 created
PS C:\WINDOWS\system32>

若要列出部署中啟動的 Pod，請執行下列命令：

kubectl get pods -n <Name of the namespace>

範例輸出如下：

PS C:\WINDOWS\system32> kubectl get pods -n mynamesp1
NAME                 READY   STATUS    RESTARTS   AGE
cuda-sample1-27srm   1/1     Running   0          28s
cuda-sample2-db9vx   1/1     Running   0          27s
PS C:\WINDOWS\system32>

有兩個 Pod，cuda-sample1-cf979886d-xcwsq 與 cuda-sample2-68b4899948-vcv68 在您的裝置上執行。

擷取 Pod 的詳細資料。執行以下命令：

kubectl -n <Name of the namespace> describe <Name of the job>

範例輸出如下：

PS C:\WINDOWS\system32> kubectl -n mynamesp1 describe job.batch/cuda-sample1;  kubectl -n mynamesp1 describe job.batch/cuda-sample2
Name:           cuda-sample1
Namespace:      mynamesp1
Selector:       controller-uid=22783f76-6af1-490d-b6eb-67dd4cda0e1f
Labels:         controller-uid=22783f76-6af1-490d-b6eb-67dd4cda0e1f
                job-name=cuda-sample1
Annotations:    kubectl.kubernetes.io/last-applied-configuration:
                  {"apiVersion":"batch/v1","kind":"Job","metadata":{"annotations":{},"name":"cuda-sample1","namespace":"mynamesp1"},"spec":{"backoffLimit":1...
Parallelism:    1
Completions:    1
Start Time:     Wed, 03 Mar 2021 12:25:34 -0800
Pods Statuses:  1 Running / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  controller-uid=22783f76-6af1-490d-b6eb-67dd4cda0e1f
           job-name=cuda-sample1
  Containers:
   cuda-sample-container1:
    Image:      nvidia/samples:nbody
    Port:       <none>
    Host Port:  <none>
    Command:
      /tmp/nbody
    Args:
      -benchmark
      -i=10000
    Environment:
      NVIDIA_VISIBLE_DEVICES:  0
    Mounts:                    <none>
  Volumes:                     <none>
Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  SuccessfulCreate  60s   job-controller  Created pod: cuda-sample1-27srm
Name:           cuda-sample2
Namespace:      mynamesp1
Selector:       controller-uid=e68c8d5a-718e-4880-b53f-26458dc24381
Labels:         controller-uid=e68c8d5a-718e-4880-b53f-26458dc24381
                job-name=cuda-sample2
Annotations:    kubectl.kubernetes.io/last-applied-configuration:
                  {"apiVersion":"batch/v1","kind":"Job","metadata":{"annotations":{},"name":"cuda-sample2","namespace":"mynamesp1"},"spec":{"backoffLimit":1...
Parallelism:    1
Completions:    1
Start Time:     Wed, 03 Mar 2021 12:25:35 -0800
Pods Statuses:  1 Running / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  controller-uid=e68c8d5a-718e-4880-b53f-26458dc24381
           job-name=cuda-sample2
  Containers:
   cuda-sample-container2:
    Image:      nvidia/samples:nbody
    Port:       <none>
    Host Port:  <none>
    Command:
      /tmp/nbody
    Args:
      -benchmark
      -i=10000
    Environment:
      NVIDIA_VISIBLE_DEVICES:  0
    Mounts:                    <none>
  Volumes:                     <none>
Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  SuccessfulCreate  60s   job-controller  Created pod: cuda-sample2-db9vx
PS C:\WINDOWS\system32>

輸出顯示本作業已成功建立這兩個 Pod。

雖然這兩個容器都在執行 n 體模擬，但請檢視來自 Nvidia smi 輸出的 GPU 使用率。請移至裝置的 PowerShell 介面並執行 Get-HcsGpuNvidiaSmi。

以下為兩個容器都執行 n 體模擬時的範例輸出：

[10.100.10.10]: PS>Get-HcsGpuNvidiaSmi
K8S-1HXQG13CL-1HXQG13:

Wed Mar  3 12:26:41 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00002C74:00:00.0 Off |                    0 |
| N/A   64C    P0    69W /  70W |    221MiB / 15109MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    197976      C   /tmp/nbody                        109MiB |
|    0   N/A  N/A    198051      C   /tmp/nbody                        109MiB |
+-----------------------------------------------------------------------------+
[10.100.10.10]: PS>

如您所見，GPU 0 上的 n 體模擬 (Type = C) 存在著兩個容器。

監視 n 體模擬。執行 get pod 命令。以下為模擬執行時的範例輸出。

PS C:\WINDOWS\system32> kubectl get pods -n mynamesp1
NAME                 READY   STATUS    RESTARTS   AGE
cuda-sample1-27srm   1/1     Running   0          70s
cuda-sample2-db9vx   1/1     Running   0          69s
PS C:\WINDOWS\system32>

模擬完成時，輸出內容即會顯示完成。範例輸出如下：

PS C:\WINDOWS\system32> kubectl get pods -n mynamesp1
NAME                 READY   STATUS      RESTARTS   AGE
cuda-sample1-27srm   0/1     Completed   0          2m54s
cuda-sample2-db9vx   0/1     Completed   0          2m53s
PS C:\WINDOWS\system32>

模擬完成之後，可檢視記錄和完成模擬的總時間。執行以下命令：

kubectl logs -n <Name of the namespace> <pod name>

範例輸出如下：

PS C:\WINDOWS\system32> kubectl logs -n mynamesp1 cuda-sample1-27srm
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
===========// CUT //===================// CUT //=====================  
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Turing" with compute capability 7.5

> Compute 7.5 CUDA device: [Tesla T4]
40960 bodies, total time for 10000 iterations: 170398.766 ms
= 98.459 billion interactions per second
= 1969.171 single-precision GFLOP/s at 20 flops per interaction
PS C:\WINDOWS\system32>

PS C:\WINDOWS\system32> kubectl logs -n mynamesp1 cuda-sample2-db9vx
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
===========// CUT //===================// CUT //=====================
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Turing" with compute capability 7.5

> Compute 7.5 CUDA device: [Tesla T4]
40960 bodies, total time for 10000 iterations: 170368.859 ms
= 98.476 billion interactions per second
= 1969.517 single-precision GFLOP/s at 20 flops per interaction
PS C:\WINDOWS\system32>

目前不應該在 GPU 上執行任何程序。使用 Nvidia smi 的輸出來檢視 GPU 使用率，即可加以確認。

[10.100.10.10]: PS>Get-HcsGpuNvidiaSmi
K8S-1HXQG13CL-1HXQG13:

Wed Mar  3 12:32:52 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00002C74:00:00.0 Off |                    0 |
| N/A   38C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
[10.100.10.10]: PS>

當您透過 MPS 啟用 GPU 內容共用時，需執行第二個作業，以在兩個 CUDA 容器上部署 n 體模擬。首先要在裝置上啟用 MPS。

連線到裝置的 PowerShell 介面。

若要在您的裝置上啟用 MPS，請執行 Start-HcsGpuMPS 命令。

[10.100.10.10]: PS>Start-HcsGpuMPS
K8S-1HXQG13CL-1HXQG13:

Set compute mode to EXCLUSIVE_PROCESS for GPU 00002C74:00:00.0.
All done.
Created nvidia-mps.service
[10.100.10.10]: PS>

請使用您稍早使用的相同部署 yaml 來執行作業。您可能會需要刪除現有的部署。請參閱刪除部署。

範例輸出如下：

PS C:\WINDOWS\system32> kubectl -n mynamesp1 delete -f C:\gpu-sharing\k8-gpusharing.yaml
job.batch "cuda-sample1" deleted
job.batch "cuda-sample2" deleted
PS C:\WINDOWS\system32> kubectl get pods -n mynamesp1
No resources found.
PS C:\WINDOWS\system32> kubectl -n mynamesp1 apply -f C:\gpu-sharing\k8-gpusharing.yaml
job.batch/cuda-sample1 created
job.batch/cuda-sample2 created
PS C:\WINDOWS\system32> kubectl get pods -n mynamesp1
NAME                 READY   STATUS    RESTARTS   AGE
cuda-sample1-vcznt   1/1     Running   0          21s
cuda-sample2-zkx4w   1/1     Running   0          21s
PS C:\WINDOWS\system32> kubectl -n mynamesp1 describe job.batch/cuda-sample1;  kubectl -n mynamesp1 describe job.batch/cuda-sample2
Name:           cuda-sample1
Namespace:      mynamesp1
Selector:       controller-uid=ed06bdf0-a282-4b35-a2a0-c0d36303a35e
Labels:         controller-uid=ed06bdf0-a282-4b35-a2a0-c0d36303a35e
                job-name=cuda-sample1
Annotations:    kubectl.kubernetes.io/last-applied-configuration:
                  {"apiVersion":"batch/v1","kind":"Job","metadata":{"annotations":{},"name":"cuda-sample1","namespace":"mynamesp1"},"spec":{"backoffLimit":1...
Parallelism:    1
Completions:    1
Start Time:     Wed, 03 Mar 2021 21:51:51 -0800
Pods Statuses:  1 Running / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  controller-uid=ed06bdf0-a282-4b35-a2a0-c0d36303a35e
           job-name=cuda-sample1
  Containers:
   cuda-sample-container1:
    Image:      nvidia/samples:nbody
    Port:       <none>
    Host Port:  <none>
    Command:
      /tmp/nbody
    Args:
      -benchmark
      -i=10000
    Environment:
      NVIDIA_VISIBLE_DEVICES:  0
    Mounts:                    <none>
  Volumes:                     <none>
Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  SuccessfulCreate  46s   job-controller  Created pod: cuda-sample1-vcznt
Name:           cuda-sample2
Namespace:      mynamesp1
Selector:       controller-uid=6282b8fa-e76d-4f45-aa85-653ee0212b29
Labels:         controller-uid=6282b8fa-e76d-4f45-aa85-653ee0212b29
                job-name=cuda-sample2
Annotations:    kubectl.kubernetes.io/last-applied-configuration:
                  {"apiVersion":"batch/v1","kind":"Job","metadata":{"annotations":{},"name":"cuda-sample2","namespace":"mynamesp1"},"spec":{"backoffLimit":1...
Parallelism:    1
Completions:    1
Start Time:     Wed, 03 Mar 2021 21:51:51 -0800
Pods Statuses:  1 Running / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  controller-uid=6282b8fa-e76d-4f45-aa85-653ee0212b29
           job-name=cuda-sample2
  Containers:
   cuda-sample-container2:
    Image:      nvidia/samples:nbody
    Port:       <none>
    Host Port:  <none>
    Command:
      /tmp/nbody
    Args:
      -benchmark
      -i=10000
    Environment:
      NVIDIA_VISIBLE_DEVICES:  0
    Mounts:                    <none>
  Volumes:                     <none>
Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  SuccessfulCreate  47s   job-controller  Created pod: cuda-sample2-zkx4w
PS C:\WINDOWS\system32>

在執行模擬的同時，可以檢視 Nvidia smi 輸出。輸出會顯示對應至 cuda 容器 (M + C 類型) 的程式，其中利用 n 體模擬和 MPS 服務 (C 類型) 來執行。所有這些程序都會共用 GPU 0。

PS>Get-HcsGpuNvidiaSmi
K8S-1HXQG13CL-1HXQG13:

Mon Mar  3 21:54:50 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 0000E00B:00:00.0 Off |                    0 |
| N/A   45C    P0    68W /  70W |    242MiB / 15109MiB |    100%   E. Process |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    144377    M+C   /tmp/nbody                        107MiB |
|    0   N/A  N/A    144379    M+C   /tmp/nbody                        107MiB |
|    0   N/A  N/A    144443      C   nvidia-cuda-mps-server             25MiB |
+-----------------------------------------------------------------------------+

模擬完成之後，可檢視記錄和完成模擬的總時間。執行以下命令：

    PS C:\WINDOWS\system32> kubectl get pods -n mynamesp1
    NAME                 READY   STATUS      RESTARTS   AGE
    cuda-sample1-vcznt   0/1     Completed   0          5m44s
    cuda-sample2-zkx4w   0/1     Completed   0          5m44s
    PS C:\WINDOWS\system32> kubectl logs -n mynamesp1 cuda-sample1-vcznt
    Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
    ===========// CUT //===================// CUT //=====================    
    > Windowed mode
    > Simulation data stored in video memory
    > Single precision floating point simulation
    > 1 Devices used for simulation
    GPU Device 0: "Turing" with compute capability 7.5

    > Compute 7.5 CUDA device: [Tesla T4]
    40960 bodies, total time for 10000 iterations: 154979.453 ms
    = 108.254 billion interactions per second
    = 2165.089 single-precision GFLOP/s at 20 flops per interaction


    PS C:\WINDOWS\system32> kubectl logs -n mynamesp1 cuda-sample2-zkx4w
    Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
    ===========// CUT //===================// CUT //=====================
    > Windowed mode
    > Simulation data stored in video memory
    > Single precision floating point simulation
    > 1 Devices used for simulation
    GPU Device 0: "Turing" with compute capability 7.5

    > Compute 7.5 CUDA device: [Tesla T4]
    40960 bodies, total time for 10000 iterations: 154986.734 ms
    = 108.249 billion interactions per second
    = 2164.987 single-precision GFLOP/s at 20 flops per interaction
    PS C:\WINDOWS\system32>

模擬完成後，可以再度檢視 Nvidia smi 輸出。只有 MPS 服務的 nvidia-cuda-mps-server 處理序會顯示正在執行。

PS>Get-HcsGpuNvidiaSmi
K8S-1HXQG13CL-1HXQG13:

Mon Mar  3 21:59:55 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 0000E00B:00:00.0 Off |                    0 |
| N/A   37C    P8     9W /  70W |     28MiB / 15109MiB |      0%   E. Process |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    144443      C   nvidia-cuda-mps-server             25MiB |
+-----------------------------------------------------------------------------+

刪除部署

在已啟用 MPS 且裝置上停用 MPS 的情況下執行時，可能會需要刪除部署。

若要刪除裝置上的部署，請執行下列命令：

kubectl delete -f <Path to the deployment .yaml> -n <Name of the namespace>

範例輸出如下：

PS C:\WINDOWS\system32> kubectl delete -f 'C:\gpu-sharing\k8-gpusharing.yaml' -n mynamesp1
deployment.apps "cuda-sample1" deleted
deployment.apps "cuda-sample2" deleted
PS C:\WINDOWS\system32>

下一步

在 Azure Stack Edge Pro 上部署具有 GPU 共用的 IoT Edge 工作負載。

共用方式為

必要條件

驗證 GPU 驅動程式、CUDA 版本

刪除部署

下一步

意見反應

其他資源

共用方式為

在 Azure Stack Edge Pro 上採取 GPU 共用來部署 Kubernetes 工作負載

必要條件

驗證 GPU 驅動程式、CUDA 版本

無內容共用的作業

具有內容共用的作業

刪除部署

下一步

意見反應

其他資源