Tutorial: Deploy AI image generation with serverless GPUs

In this tutorial, you deploy a Stable Diffusion-powered image generator using serverless GPUs in Azure Container Apps. You can deploy this solution either as an Azure Functions app or as a standard container app, depending on your needs.

Serverless GPUs provide on-demand access to GPU compute resources without infrastructure management. You pay only for the GPU time you use, and the solution automatically scales to zero when idle.

In this tutorial, you learn how to:

Create a Container Apps environment with GPU workload profiles
Deploy an AI image generation API using serverless GPUs
Test the deployment with text-to-image requests
Monitor GPU utilization and optimize performance
Clean up resources to avoid unnecessary costs

Prerequisites

Requirement	Description
Azure subscription	If you don't have one, create a free account.
GPU quota	Request GPU quota access. Approval typically takes one to two business days.
Azure CLI	Install the Azure CLI version 2.62.0 or later.
Azure Developer CLI	Install the Azure Developer CLI for streamlined deployment.
Docker Desktop	Required for local container development. Install Docker Desktop.

Important

Request GPU quota access before starting this tutorial. You can continue reading while you wait for approval, but deployment requires an approved quota.

To verify your tools are installed correctly, run the following commands:

az --version
azd version
docker --version

Architecture overview

This solution uses the following Azure services:

Component	Purpose
Azure Container Apps	Hosts your application with serverless GPU support
GPU workload profile	Provides NVIDIA T4 GPU compute for AI inference
Azure Container Registry	Stores your custom container image
Azure Storage	Required for Azure Functions runtime (Functions deployment only)
Application Insights	Provides monitoring and diagnostics

The application follows a straightforward request flow. When a client sends a request, it first reaches the Container Apps ingress endpoint. Your application then processes the request and passes it to the Stable Diffusion model running on the GPU. The model generates the requested image based on your prompt and returns the generated image as a response to the client.

Cost considerations

Serverless GPUs use per-second billing. Review these cost factors before deploying:

Factor	Impact
GPU type	NVIDIA T4 costs less than A100
Minimum replicas	Set to 0 for development (scales to zero when idle)
Cold start time	First request takes 1-2 minutes (model loading)
Request duration	Image generation typically takes 5-15 seconds

For detailed pricing, see Azure Container Apps pricing.

Get the sample code

Clone the sample repository that contains the Azure Functions implementation:

git clone https://github.com/Azure-Samples/function-on-aca-gpu.git
cd function-on-aca-gpu

The repository contains:

File	Purpose
`function_app.py`	HTTP-triggered function for image generation
`requirements.txt`	Python dependencies including the diffusers library
`Dockerfile`	Container image definition with GPU support
`host.json`	Azure Functions configuration
`azure.yaml`	Azure Developer CLI deployment configuration

Deploy by using the Azure portal

Follow these steps to create a GPU-enabled container app and deploy the image generation solution by using the Azure portal.

Create a Container Apps environment with GPU

In the Azure portal, search for Container Apps and select it.
Select Create > Container App.

On the Basics tab, configure the following settings:

Setting	Value
Subscription	Select your Azure subscription
Resource group	Select Create new and enter `rg-gpu-image-gen`
Container app name	Enter `ca-image-gen`
Deployment source	Select Container image
Region	Select Sweden Central

Under Container Apps environment, select Create new.
In the Create Container Apps environment pane, enter cae-gpu-image-gen for the environment name.
Select Create to create the environment.
Select Next: Container >.

Configure the container with GPU

On the Container tab, configure the following settings:

Setting	Value
Name	Enter `gpu-image-gen-container`
Image source	Select Docker Hub or other registries
Image type	Select Public
Registry login server	Enter `mcr.microsoft.com`
Image and tag	Enter `k8se/gpu-quickstart:latest`
Workload profile	Select Consumption - Up to 4 vCPUs, 8 GiB memory
GPU	To enable GPU, select the checkbox
GPU Type	Select Consumption-GPU-NC8as-T4 and select the link to add the profile

Select Next: Ingress >.

Configure ingress

On the Ingress tab, configure the following settings:

Setting Value

Ingress Select Enabled

Ingress traffic Select Accepting traffic from anywhere

Target port Enter 80
Select Review + create.
Review your settings and select Create.
Wait for the deployment to complete (approximately 5 minutes), then select Go to resource.

Setting	Value
Ingress	Select Enabled
Ingress traffic	Select Accepting traffic from anywhere
Target port	Enter `80`

Verify the deployment

On the container app Overview page, copy the Application URL.
Open the URL in a browser to access the image generation interface.

Deploy with Azure CLI

You can deploy by using either the Azure Developer CLI (recommended for the Functions app) or the Azure CLI (for more control over individual resources).

Option A: Deploy as Azure Functions app with azd

The Azure Developer CLI provides the fastest deployment experience for the Azure Functions implementation.

Navigate to the cloned repository:
```
cd function-on-aca-gpu
```
Initialize and deploy the application:
```
azd up
```
When prompted, provide the following values:

Prompt Value

Environment name Enter a unique name (for example, gpufunc-dev)

Azure location Select swedencentral

Azure subscription Select your subscription

The deployment takes approximately 15-20 minutes.
When deployment completes, note the endpoint URL displayed in the output.

Prompt	Value
Environment name	Enter a unique name (for example, `gpufunc-dev`)
Azure location	Select `swedencentral`
Azure subscription	Select your subscription

The azd up command creates the following resources:

Resource	Purpose
Resource group	Container for all resources
Resource group	Container for all resources
Container Apps environment	Hosts the app with GPU workload profile
Container registry	Stores your custom container image
Storage account	Required for Azure Functions runtime
Application Insights	Monitoring and diagnostics
Function App	The image generation API

Option B: Deploy as container app by using Azure CLI

For more control over the deployment, use Azure CLI to create each resource individually.

Set the environment variables:

RESOURCE_GROUP="rg-gpu-image-gen"
ENVIRONMENT_NAME="cae-gpu-image-gen"
LOCATION="swedencentral"
CONTAINER_APP_NAME="ca-image-gen"
CONTAINER_IMAGE="mcr.microsoft.com/k8se/gpu-quickstart:latest"
WORKLOAD_PROFILE_NAME="NC8as-T4"
WORKLOAD_PROFILE_TYPE="Consumption-GPU-NC8as-T4"

This script defines the configuration values used throughout the deployment. The WORKLOAD_PROFILE_TYPE specifies the NVIDIA T4 GPU configuration.

Create the resource group:
```
az group create \
  --name $RESOURCE_GROUP \
  --location $LOCATION \
  --query "properties.provisioningState" \
  --output tsv
```
The command creates a resource group in Sweden Central, which supports GPU workload profiles. The output should display Succeeded.

Create the Container Apps environment:

az containerapp env create \
  --name $ENVIRONMENT_NAME \
  --resource-group $RESOURCE_GROUP \
  --location $LOCATION \
  --query "properties.provisioningState" \
  --output tsv

This command creates the managed environment that hosts your container apps. The output should display Succeeded.

Add the GPU workload profile to your environment:

az containerapp env workload-profile add \
  --name $ENVIRONMENT_NAME \
  --resource-group $RESOURCE_GROUP \
  --workload-profile-name $WORKLOAD_PROFILE_NAME \
  --workload-profile-type $WORKLOAD_PROFILE_TYPE

This command adds the NVIDIA T4 GPU workload profile to your environment. The profile enables GPU compute for containers that require it.

Create the container app with GPU support:

az containerapp create \
  --name $CONTAINER_APP_NAME \
  --resource-group $RESOURCE_GROUP \
  --environment $ENVIRONMENT_NAME \
  --image $CONTAINER_IMAGE \
  --target-port 80 \
  --ingress external \
  --cpu 8.0 \
  --memory 56.0Gi \
  --workload-profile-name $WORKLOAD_PROFILE_NAME \
  --query properties.configuration.ingress.fqdn \
  --output tsv

This command creates the container app and assigns it to the GPU workload profile. The --cpu and --memory values match the T4 profile requirements. The command outputs the application URL.

Copy the output URL for testing in the next section.

Test the image generation API

Note

The first request takes one to two minutes while the model downloads (approximately 5 GB) and loads into GPU memory. Subsequent requests complete in 5-15 seconds.

Verify the application is running

Open the application URL in a browser. You should see the image generation interface.

Generate an image using the UI

In the text field, enter a prompt such as:

A friendly robot chef cooking pasta in a cozy kitchen, digital art style

Select Generate Image.
Wait for the image to appear. The first generation takes longer due to model loading.

Generate an image using the API (Functions deployment)

If you deployed the Azure Functions version, you can call the API directly:

curl -X POST "https://<YOUR-FUNCTION-URL>/api/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A friendly robot chef cooking pasta in a cozy kitchen",
    "num_steps": 25
  }'

Replace <YOUR-FUNCTION-URL> with your actual function app URL. The num_steps parameter controls image quality (higher values produce better results but takes longer).

Expected response format:

{
  "success": true,
  "image": "iVBORw0KGgoAAAANSUhEUgAA...(base64 PNG data)..."
}

The response contains a base64-encoded PNG image that you can decode and save.

Monitor GPU usage

Monitoring helps you understand GPU utilization and optimize costs.

View GPU status in the console

In the Azure portal, go to your container app.
Under Monitoring, select Console.
Select your replica and container.
Select Reconnect, and then choose /bin/bash as the startup command.
Run the following command to view GPU status:
```
nvidia-smi
```
The output shows GPU memory usage, utilization percentage, and running processes.

View metrics in Azure Monitor

In the Azure portal, go to your container app.
Under Monitoring, select Metrics.
Add metrics for:
- CPU Usage
- Memory Usage
- Replica Count

For detailed observability options, see Monitor Azure Container Apps.

Optimize cold start performance

To reduce cold start time for production workloads:

Enable artifact streaming to speed up container image pulls.
Set minimum replicas to 1 to keep an instance warm:
```
az containerapp update \
  --name $CONTAINER_APP_NAME \
  --resource-group $RESOURCE_GROUP \
  --min-replicas 1
```
This command keeps one instance always running, eliminating cold start delays but incurring continuous costs.

For more optimization techniques, see Improve cold start for serverless GPUs.

Troubleshooting

Issue	Cause	Solution
"GPU quota exceeded" error	No GPU quota approved	Request GPU quota and wait for approval
Container fails to start	Image pull timeout	Enable artifact streaming or use a smaller base image
First request times out	Model download in progress	Wait 2-3 minutes and retry. This short delay is expected behavior.
"CUDA out of memory" error	Model exceeds GPU memory	Reduce batch size or use a smaller model variant
502 Bad Gateway	Container not ready	Check container logs; ensure health probes are configured
Slow image generation	Insufficient inference steps	Increase `num_steps` parameter (higher values = better quality, slower)

To view container logs for debugging:

az containerapp logs show \
  --name $CONTAINER_APP_NAME \
  --resource-group $RESOURCE_GROUP \
  --follow

This command streams real-time logs from your container, helping you identify startup issues or runtime errors.

Clean up resources

When you finish with the resources, delete them to avoid ongoing charges.

In the Azure portal, search for Resource groups.
Select the resource group you created (for example, rg-gpu-image-gen).
Select Delete resource group.
To confirm deletion, enter the resource group name.
Select Delete.

If you deployed by using Azure Developer CLI:

azd down

If you deployed by using Azure CLI:

az group delete --name $RESOURCE_GROUP --yes --no-wait

The --no-wait flag returns immediately while deletion continues in the background.

Next steps

Improve cold start for serverless GPUs

Feedback

Was this page helpful?

Last updated on 2026-02-12