Use GPUs with Azure Functions on Azure Container Apps

In this guide, you create and deploy a GPU-enabled function app on Azure Container Apps. You package your function code with GPU libraries into a container image, then deploy it to access NVIDIA T4 or A100 GPUs on demand.

By combining Azure Functions' serverless model with GPU compute on Container Apps, you can run compute-intensive AI inference, image processing, and machine learning workloads that automatically scale based on demand. You pay only for GPU compute time when your functions are actively processing requests.

Overview

When you host Azure Functions on Azure Container Apps, you can access serverless GPUs with NVIDIA A100 and T4 resources. Serverless GPUs scale to zero when idle, so you're billed only for active compute time.

GPU-enabled functions require:

A Container Apps environment with GPU workload profiles
A custom container image that includes the Functions runtime and GPU libraries (CUDA, cuDNN, AI frameworks)
GPU quota approved for your Azure subscription

Prerequisites

Before you start, verify that you have the following items:

An Azure subscription with an active account. Create an account for free.
GPU quota approved for your subscription. Request GPU quota if needed. Enterprise and pay-as-you-go subscriptions typically have A100 and T4 quota enabled by default.
Azure CLI version 2.62.0 or later.
Azure Functions Core Tools version 4.x.
Docker installed locally to build container images.

Container image requirements for GPU

To run Azure Functions with GPU on Container Apps, you must provide a custom container image that includes the Functions runtime, GPU libraries, and your application code. This section describes what your image must contain.

Base image

Your container image must start with an official Azure Functions base image that includes the Functions host runtime. Choose the image that matches your runtime:

Runtime	Base Image
Python 3.11	`mcr.microsoft.com/azure-functions/python:4-python3.11`
Node.js 20	`mcr.microsoft.com/azure-functions/node:4-node20`
.NET 8	`mcr.microsoft.com/azure-functions/dotnet-isolated:4-dotnet-isolated8.0`
Java 17	`mcr.microsoft.com/azure-functions/java:4-java17`
Custom handler	`mcr.microsoft.com/azure-functions/base:4`

The quickstart image (mcr.microsoft.com/k8se/gpu-quickstart:latest) is useful for testing GPU access in your environment, but it doesn't include the Functions runtime. Use it only to validate that GPU support is working.

Tip

When you create a Functions project with func init --docker, the generated Dockerfile already uses the correct base image for your chosen runtime.

CUDA and GPU libraries

Azure Container Apps provides both the NVIDIA driver and a platform-provided CUDA runtime (currently CUDA 12.x). If your application can use the platform CUDA version, you don't need extra CUDA setup. However, you must include AI/ML frameworks and additional libraries like cuDNN in your container image.

Choose one of these approaches:

Recommended: Use the platform-provided CUDA runtime with GPU frameworks

Most AI/ML frameworks like PyTorch and TensorFlow include their own CUDA runtime. Install them with the CUDA variant that matches the platform version:

FROM mcr.microsoft.com/azure-functions/python:4-python3.11

# PyTorch includes CUDA runtime
RUN pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

COPY . /home/site/wwwroot
ENV AzureWebJobsScriptRoot=/home/site/wwwroot

Alternative: Pin a specific CUDA version using multi-stage build

If your application requires a CUDA version different from the platform default, use a multi-stage build:

# Stage 1: CUDA runtime
FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04 AS cuda-base

# Stage 2: Functions image with CUDA libraries
FROM mcr.microsoft.com/azure-functions/python:4-python3.11

# Copy CUDA libraries from the NVIDIA base image
COPY --from=cuda-base /usr/local/cuda /usr/local/cuda
ENV PATH="/usr/local/cuda/bin:${PATH}"
ENV LD_LIBRARY_PATH="/usr/local/cuda/lib64:${LD_LIBRARY_PATH}"

COPY requirements.txt /
RUN pip install -r /requirements.txt

COPY . /home/site/wwwroot
ENV AzureWebJobsScriptRoot=/home/site/wwwroot

Important

If you provide your own CUDA runtime, verify it's compatible with the NVIDIA driver version on the platform. Generally, the NVIDIA driver is backward-compatible with older CUDA versions. Check the NVIDIA CUDA compatibility matrix for details. If you use the platform-provided CUDA, verify your application works with the current platform versions.

Optimize container image size

GPU container images are typically large (5-15 GB) because they include CUDA libraries and model files. Large images increase pull times and cold start latency. Use these strategies to reduce startup time:

Use multi-stage Docker builds to exclude build dependencies from the final image.
Store large model files in Azure Storage mounts instead of bundling them in the image.
Enable artifact streaming on your Azure Container Registry (Premium SKU required) for faster image pulls.
Use .dockerignore to exclude unnecessary files from the build context.

Sample Dockerfiles and resources

For examples and templates, see:

Azure Functions on Container Apps GPU sample: Complete example with Dockerfile, function code, and deployment configuration.
Azure Container Apps GPU templates: Templates for deploying models to serverless GPUs.
Azure Functions base images: Official Functions runtime images for all supported languages.

Create and deploy a GPU-enabled function app

In this section, you create a GPU-enabled function, package it in a container, and deploy it to Azure Container Apps.

Step 1: Create a Functions project with Docker support

Run the func init command to create a new Functions project with a Dockerfile:

func init MyGpuFunctionApp --worker-runtime python --docker
cd MyGpuFunctionApp

Step 2: Add a function

Run the func new command to create an HTTP-triggered function:

func new --name GpuProcess --template "HTTP trigger"

Step 3: Update the Dockerfile for GPU support

Edit your Dockerfile to use a GPU-enabled base image and add GPU dependencies. This example uses PyTorch for AI inference:

FROM mcr.microsoft.com/azure-functions/python:4-python3.11

# Install GPU dependencies
RUN pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
RUN pip install -r requirements.txt

ENV AzureWebJobsScriptRoot=/home/site/wwwroot
COPY . /home/site/wwwroot

Important

The platform provides a CUDA runtime by default. If your workload requires specific CUDA versions or additional GPU libraries, include them in your container image. Verify compatibility with the GPU software stack versions supported by Azure Container Apps.

Step 4: Create a Container Apps environment with GPU

Set your environment variables:

RESOURCE_GROUP="myGpuFunctionRg"
ENVIRONMENT_NAME="myGpuEnv"
LOCATION="swedencentral"

Create a resource group:

az group create --name $RESOURCE_GROUP --location $LOCATION

Create a Container Apps environment:

az containerapp env create \
  --name $ENVIRONMENT_NAME \
  --resource-group $RESOURCE_GROUP \
  --location $LOCATION

Add a GPU workload profile to your environment:

az containerapp env workload-profile add \
  --name $ENVIRONMENT_NAME \
  --resource-group $RESOURCE_GROUP \
  --workload-profile-name gpu-t4 \
  --workload-profile-type Consumption-GPU-NC8as-T4

Step 5: Build and push the container image

Set your container registry variables:

REGISTRY_NAME="myGpuRegistry"

Create a container registry (Premium SKU enables artifact streaming for faster image pulls):

az acr create \
  --name $REGISTRY_NAME \
  --resource-group $RESOURCE_GROUP \
  --sku Premium

Build and push the container image:

az acr build \
  --registry $REGISTRY_NAME \
  --image my-gpu-function:v1 \
  .

Step 6: Deploy the function app with GPU

Set your app variables:

APP_NAME="myGpuFunction"

Deploy the container app with GPU support:

az containerapp create \
  --name $APP_NAME \
  --resource-group $RESOURCE_GROUP \
  --environment $ENVIRONMENT_NAME \
  --image $REGISTRY_NAME.azurecr.io/my-gpu-function:v1 \
  --registry-server $REGISTRY_NAME.azurecr.io \
  --workload-profile-name gpu-t4 \
  --cpu 8.0 \
  --memory 56.0Gi \
  --ingress external \
  --target-port 80 \
  --kind functionapp \
  --min-replicas 0 \
  --max-replicas 5

The --kind functionapp flag enables Azure Functions integration. Setting --min-replicas 0 enables scale-to-zero behavior for cost savings.

Verify the deployment

After deployment completes, test that your function is running:

Get the function app URL:

az containerapp show \
  --name $APP_NAME \
  --resource-group $RESOURCE_GROUP \
  --query "properties.configuration.ingress.fqdn" \
  --output tsv

Call your function:

curl https://<YOUR-FUNCTION-URL>/api/GpuProcess

If the function responds successfully, your deployment is working. You can now call it with your own data.

Optimize cold start

GPU workloads often involve large container images and model files. These strategies reduce startup latency:

Enable artifact streaming: Use artifact streaming on your Azure Container Registry (Premium SKU required) to speed up image pulls.
Use storage mounts: Store large model files in an Azure storage mount instead of bundling them in the container image.
Set minimum replicas: Set --min-replicas 1 to keep a warm instance, eliminating cold starts. This setting incurs continuous charges, but it's worth the cost for production workloads with strict latency requirements.

For a complete tutorial that demonstrates GPU deployment with performance monitoring, see Tutorial: Deploy AI image generation with serverless GPUs with Azure Functions on ACA.

Considerations and limitations

Keep the following constraints in mind:

One GPU per container: Only one container in a function app can access the GPU. If you use sidecars, only the first container gets GPU access.
Workload profiles environment required: Serverless GPUs require a workload profiles environment. Consumption-only environments don't support GPU.
Region availability: GPU workload profiles are available only in specific regions. See supported regions.
GPU quota required: You must have GPU quota approved for your subscription. See Request GPU quota.

Monitor GPU usage

Use Azure Container Apps observability tools to monitor GPU utilization and application performance.

Check GPU status in the console

In the Azure portal, go to your container app.
Select Monitoring > Console.
Select your replica and container, and then choose /bin/bash.
Run nvidia-smi to view GPU memory usage, utilization percentage, and running processes.

View logs and metrics

Select Monitoring > Log stream to view real-time container logs.
Select Monitoring > Metrics to view CPU, memory, and replica count metrics.
Select Monitoring > Logs to run KQL queries against your container app's log data.

For more information on observability, see Monitor Azure Container Apps. You can also configure Application Insights for detailed function execution telemetry.

Next steps

Tutorial: Deploy AI image generation with serverless GPUs with Azure Functions on ACA: Step-by-step guide for deploying a complete image generation solution
Azure Functions on Azure Container Apps overview
Using serverless GPUs in Azure Container Apps
Improve cold start for serverless GPUs
Monitor Azure Container Apps

Feedback

Was this page helpful?

Last updated on 2026-05-16