Hi ,
Thanks for reaching out to Microsoft Q&A.
- Choose a GPU-enabled Azure service (VM, AKS, AML, ACI)
- Deploy your API on a GPU-enabled VM/container
- Modify your ML code to leverage GPU (PyTorch, TensorFlow, etc.)
- Test and monitor GPU usage with
nvidia-smi
- Enable auto-scaling for high-traffic workloads
Since you're running an embedding model via an API, you need to ensure that your service supports GPU acceleration. Here are common options:
- Azure Machine Learning (AML) Compute Instance or Cluster (for ML workloads)
- Azure Kubernetes Service (AKS) with GPU nodes (for scalable APIs)
- Azure Virtual Machines (VMs) with GPU (for dedicated model inference)
- Azure Container Instances (ACI) with GPU (for lightweight containerized inference)
Choose a GPU-Enabled VM
Azure provides various GPU VMs optimized for ML workloads. Choose an appropriate one based on your needs...
GPU VM Series | GPU Type | Use Case |
---|---|---|
NC-series | NVIDIA Tesla K80/V100 | Deep Learning, Training |
NC-series | NVIDIA Tesla K80/V100 | Deep Learning, Training |
ND-series | NVIDIA Tesla P40/P100 | AI/ML, Training |
NV-series | NVIDIA Tesla M60 | Graphics, Inference |
ND A100 v4 | NVIDIA A100 | High-performance AI |
imv, for embedding model inference, NC, ND, or ND A100 series should be optimal.
Please feel free to click the 'Upvote' (Thumbs-up) button and 'Accept as Answer'. This helps the community by allowing others with similar queries to easily find the solution.