Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
In this tutorial, you deploy a Stable Diffusion-powered image generator using serverless GPUs in Azure Container Apps. You can deploy this solution either as an Azure Functions app or as a standard container app, depending on your needs.
Serverless GPUs provide on-demand access to GPU compute resources without infrastructure management. You pay only for the GPU time you use, and the solution automatically scales to zero when idle.
In this tutorial, you learn how to:
- Create a Container Apps environment with GPU workload profiles
- Deploy an AI image generation API using serverless GPUs
- Test the deployment with text-to-image requests
- Monitor GPU utilization and optimize performance
- Clean up resources to avoid unnecessary costs
Prerequisites
| Requirement | Description |
|---|---|
| Azure subscription | If you don't have one, create a free account. |
| GPU quota | Request GPU quota access. Approval typically takes one to two business days. |
| Azure CLI | Install the Azure CLI version 2.62.0 or later. |
| Azure Developer CLI | Install the Azure Developer CLI for streamlined deployment. |
| Docker Desktop | Required for local container development. Install Docker Desktop. |
Important
Request GPU quota access before starting this tutorial. You can continue reading while you wait for approval, but deployment requires an approved quota.
To verify your tools are installed correctly, run the following commands:
az --version
azd version
docker --version
Architecture overview
This solution uses the following Azure services:
| Component | Purpose |
|---|---|
| Azure Container Apps | Hosts your application with serverless GPU support |
| GPU workload profile | Provides NVIDIA T4 GPU compute for AI inference |
| Azure Container Registry | Stores your custom container image |
| Azure Storage | Required for Azure Functions runtime (Functions deployment only) |
| Application Insights | Provides monitoring and diagnostics |
The application follows a straightforward request flow. When a client sends a request, it first reaches the Container Apps ingress endpoint. Your application then processes the request and passes it to the Stable Diffusion model running on the GPU. The model generates the requested image based on your prompt and returns the generated image as a response to the client.
Cost considerations
Serverless GPUs use per-second billing. Review these cost factors before deploying:
| Factor | Impact |
|---|---|
| GPU type | NVIDIA T4 costs less than A100 |
| Minimum replicas | Set to 0 for development (scales to zero when idle) |
| Cold start time | First request takes 1-2 minutes (model loading) |
| Request duration | Image generation typically takes 5-15 seconds |
For detailed pricing, see Azure Container Apps pricing.
Get the sample code
Clone the sample repository that contains the Azure Functions implementation:
git clone https://github.com/Azure-Samples/function-on-aca-gpu.git
cd function-on-aca-gpu
The repository contains:
| File | Purpose |
|---|---|
function_app.py |
HTTP-triggered function for image generation |
requirements.txt |
Python dependencies including the diffusers library |
Dockerfile |
Container image definition with GPU support |
host.json |
Azure Functions configuration |
azure.yaml |
Azure Developer CLI deployment configuration |
Deploy by using the Azure portal
Follow these steps to create a GPU-enabled container app and deploy the image generation solution by using the Azure portal.
Create a Container Apps environment with GPU
In the Azure portal, search for Container Apps and select it.
Select Create > Container App.
On the Basics tab, configure the following settings:
Setting Value Subscription Select your Azure subscription Resource group Select Create new and enter rg-gpu-image-genContainer app name Enter ca-image-genDeployment source Select Container image Region Select Sweden Central Under Container Apps environment, select Create new.
In the Create Container Apps environment pane, enter
cae-gpu-image-genfor the environment name.Select Create to create the environment.
Select Next: Container >.
Configure the container with GPU
On the Container tab, configure the following settings:
Setting Value Name Enter gpu-image-gen-containerImage source Select Docker Hub or other registries Image type Select Public Registry login server Enter mcr.microsoft.comImage and tag Enter k8se/gpu-quickstart:latestWorkload profile Select Consumption - Up to 4 vCPUs, 8 GiB memory GPU To enable GPU, select the checkbox GPU Type Select Consumption-GPU-NC8as-T4 and select the link to add the profile Select Next: Ingress >.
Configure ingress
On the Ingress tab, configure the following settings:
Setting Value Ingress Select Enabled Ingress traffic Select Accepting traffic from anywhere Target port Enter 80Select Review + create.
Review your settings and select Create.
Wait for the deployment to complete (approximately 5 minutes), then select Go to resource.
Verify the deployment
On the container app Overview page, copy the Application URL.
Open the URL in a browser to access the image generation interface.
Deploy with Azure CLI
You can deploy by using either the Azure Developer CLI (recommended for the Functions app) or the Azure CLI (for more control over individual resources).
Option A: Deploy as Azure Functions app with azd
The Azure Developer CLI provides the fastest deployment experience for the Azure Functions implementation.
Navigate to the cloned repository:
cd function-on-aca-gpuInitialize and deploy the application:
azd upWhen prompted, provide the following values:
Prompt Value Environment name Enter a unique name (for example, gpufunc-dev)Azure location Select swedencentralAzure subscription Select your subscription The deployment takes approximately 15-20 minutes.
When deployment completes, note the endpoint URL displayed in the output.
The azd up command creates the following resources:
| Resource | Purpose |
|---|---|
| Resource group | Container for all resources |
| Resource group | Container for all resources |
| Container Apps environment | Hosts the app with GPU workload profile |
| Container registry | Stores your custom container image |
| Storage account | Required for Azure Functions runtime |
| Application Insights | Monitoring and diagnostics |
| Function App | The image generation API |
Option B: Deploy as container app by using Azure CLI
For more control over the deployment, use Azure CLI to create each resource individually.
Set the environment variables:
RESOURCE_GROUP="rg-gpu-image-gen" ENVIRONMENT_NAME="cae-gpu-image-gen" LOCATION="swedencentral" CONTAINER_APP_NAME="ca-image-gen" CONTAINER_IMAGE="mcr.microsoft.com/k8se/gpu-quickstart:latest" WORKLOAD_PROFILE_NAME="NC8as-T4" WORKLOAD_PROFILE_TYPE="Consumption-GPU-NC8as-T4"This script defines the configuration values used throughout the deployment. The
WORKLOAD_PROFILE_TYPEspecifies the NVIDIA T4 GPU configuration.Create the resource group:
az group create \ --name $RESOURCE_GROUP \ --location $LOCATION \ --query "properties.provisioningState" \ --output tsvThe command creates a resource group in Sweden Central, which supports GPU workload profiles. The output should display
Succeeded.Create the Container Apps environment:
az containerapp env create \ --name $ENVIRONMENT_NAME \ --resource-group $RESOURCE_GROUP \ --location $LOCATION \ --query "properties.provisioningState" \ --output tsvThis command creates the managed environment that hosts your container apps. The output should display
Succeeded.Add the GPU workload profile to your environment:
az containerapp env workload-profile add \ --name $ENVIRONMENT_NAME \ --resource-group $RESOURCE_GROUP \ --workload-profile-name $WORKLOAD_PROFILE_NAME \ --workload-profile-type $WORKLOAD_PROFILE_TYPEThis command adds the NVIDIA T4 GPU workload profile to your environment. The profile enables GPU compute for containers that require it.
Create the container app with GPU support:
az containerapp create \ --name $CONTAINER_APP_NAME \ --resource-group $RESOURCE_GROUP \ --environment $ENVIRONMENT_NAME \ --image $CONTAINER_IMAGE \ --target-port 80 \ --ingress external \ --cpu 8.0 \ --memory 56.0Gi \ --workload-profile-name $WORKLOAD_PROFILE_NAME \ --query properties.configuration.ingress.fqdn \ --output tsvThis command creates the container app and assigns it to the GPU workload profile. The
--cpuand--memoryvalues match the T4 profile requirements. The command outputs the application URL.Copy the output URL for testing in the next section.
Test the image generation API
Note
The first request takes one to two minutes while the model downloads (approximately 5 GB) and loads into GPU memory. Subsequent requests complete in 5-15 seconds.
Verify the application is running
Open the application URL in a browser. You should see the image generation interface.
Generate an image using the UI
In the text field, enter a prompt such as:
A friendly robot chef cooking pasta in a cozy kitchen, digital art styleSelect Generate Image.
Wait for the image to appear. The first generation takes longer due to model loading.
Generate an image using the API (Functions deployment)
If you deployed the Azure Functions version, you can call the API directly:
curl -X POST "https://<YOUR-FUNCTION-URL>/api/generate" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A friendly robot chef cooking pasta in a cozy kitchen",
"num_steps": 25
}'
Replace <YOUR-FUNCTION-URL> with your actual function app URL. The num_steps parameter controls image quality (higher values produce better results but takes longer).
Expected response format:
{
"success": true,
"image": "iVBORw0KGgoAAAANSUhEUgAA...(base64 PNG data)..."
}
The response contains a base64-encoded PNG image that you can decode and save.
Monitor GPU usage
Monitoring helps you understand GPU utilization and optimize costs.
View GPU status in the console
In the Azure portal, go to your container app.
Under Monitoring, select Console.
Select your replica and container.
Select Reconnect, and then choose /bin/bash as the startup command.
Run the following command to view GPU status:
nvidia-smiThe output shows GPU memory usage, utilization percentage, and running processes.
View metrics in Azure Monitor
In the Azure portal, go to your container app.
Under Monitoring, select Metrics.
Add metrics for:
- CPU Usage
- Memory Usage
- Replica Count
For detailed observability options, see Monitor Azure Container Apps.
Optimize cold start performance
To reduce cold start time for production workloads:
Enable artifact streaming to speed up container image pulls.
Set minimum replicas to 1 to keep an instance warm:
az containerapp update \ --name $CONTAINER_APP_NAME \ --resource-group $RESOURCE_GROUP \ --min-replicas 1This command keeps one instance always running, eliminating cold start delays but incurring continuous costs.
For more optimization techniques, see Improve cold start for serverless GPUs.
Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
| "GPU quota exceeded" error | No GPU quota approved | Request GPU quota and wait for approval |
| Container fails to start | Image pull timeout | Enable artifact streaming or use a smaller base image |
| First request times out | Model download in progress | Wait 2-3 minutes and retry. This short delay is expected behavior. |
| "CUDA out of memory" error | Model exceeds GPU memory | Reduce batch size or use a smaller model variant |
| 502 Bad Gateway | Container not ready | Check container logs; ensure health probes are configured |
| Slow image generation | Insufficient inference steps | Increase num_steps parameter (higher values = better quality, slower) |
To view container logs for debugging:
az containerapp logs show \
--name $CONTAINER_APP_NAME \
--resource-group $RESOURCE_GROUP \
--follow
This command streams real-time logs from your container, helping you identify startup issues or runtime errors.
Clean up resources
When you finish with the resources, delete them to avoid ongoing charges.
In the Azure portal, search for Resource groups.
Select the resource group you created (for example,
rg-gpu-image-gen).Select Delete resource group.
To confirm deletion, enter the resource group name.
Select Delete.
If you deployed by using Azure Developer CLI:
azd down
If you deployed by using Azure CLI:
az group delete --name $RESOURCE_GROUP --yes --no-wait
The --no-wait flag returns immediately while deletion continues in the background.