Is Azure Container Apps with D32 (32vCPU, 128GB RAM) suitable for running YOLOv8, PaddleOCR, and BLIP models, or should we consider GPU-based alternatives?

Question

Is Azure Container Apps with D32 (32vCPU, 128GB RAM) suitable for running YOLOv8, PaddleOCR, and BLIP models, or should we consider GPU-based alternatives?

Arun Siripuram 911

We are evaluating the right compute environment for deploying a multi-model AI workload inside Azure Container Apps. The models include:

YOLOv8-Medium and YOLOv8-Nano (for object and axle detection)
PaddleOCR v3.0 (with PaddlePaddle 2.5.0)
BLIP-vqa-base (fine-tuned for FASTag dataset)

Our current container app is running in a D32 (32 vCPU, 128 GB RAM) CPU-based Dedicated plan. The use case involves:

Accepting high volumes of image requests per second
Processing those images through the above models in real-time
Calling Azure Document Intelligence for document parsing

❓ We would like to understand:

Is a D32 CPU environment sufficient to support this pipeline with acceptable latency and concurrency?

Are there any benchmarks or guidance from Microsoft that indicate when to move from CPU to GPU-based workloads (e.g., via AKS with GPU nodes or Azure ML)?

Is Azure Container Apps (CPU-only) ideal for inference-heavy models like YOLOv8 and BLIP, or should we consider alternatives with better GPU integration?

We’d appreciate any insights or data points that can help us decide whether to:

Stick with high-end CPU like D32 for performance and cost

Or transition to a GPU-backed setup for real-time processing reliability

0 comments

1 answer

Your answer

Answer 1

Hello Arun !

Thank you for posting on Microsoft Learn.

While D32 (32 vCPU, 128 GB RAM) offers strong CPU performance, it may fall short for your use case involving real-time image processing across three AI models.

YOLOv8-Nano run reasonably well on CPU, but YOLOv8-Medium, PaddleOCR, and especially BLIP are computationally intensive. CPU inference times can cause significant latency, especially when concurrency increases.

Each model has different compute profiles:

YOLOv8-Medium struggles on CPU under real-time demands and works far better on a GPU
PaddleOCR can work on CPU if optimized, but benefits hugely from GPU acceleration
BLIP-vqa is transformer-based and performs very poorly on CPU, often taking 2–3 seconds per image. On GPU, inference can drop to 200–500 ms

For challenging scenarios, CPU bottlenecks quickly become a problem. GPU inference allows you to meet both latency and concurrency requirements reliably.

Unfortunately, Azure Container Apps does not support GPU SKUs natively as of now. It’s designed for general-purpose workloads and lightweight containerized apps, not for inference-heavy tasks involving models like BLIP or YOLOv8-M.

Think about migrating to platforms that natively support GPUs:

Azure Kubernetes Service (AKS) with GPU node pools (NVIDIA T4, A10G, or A100)

Azure ML Managed Online Endpoints that allow you to deploy models on GPU-backed compute

Optionally, Azure VMs (like NC-series or ND-series) if you want manual control

I see that Microsoft and open-source benchmarks show that:

YOLOv8-M can run at 10–20 FPS on a T4 GPU vs 2–3 FPS on CPU

BLIP-vqa sees 4–10× latency improvements on GPU

PaddleOCR processes images in under 100 ms on GPU, compared to up to 1s on CPU

A good rule of thumb if inference latency on CPU exceeds 500 ms per model, or if concurrency > 1–2 QPS per model, consider GPU.

If your goal is real-time processing at scale, sticking with D32 is not sustainable especially due to BLIP-vqa demands.

For high-performance, low-latency pipelines, GPU-backed AKS or Azure ML endpoints are the most suitable options. You can also scale GPU workloads dynamically to manage costs during off-peak hours.

Some links to help you understand better :

https://learn.microsoft.com/en-us/azure/machine-learning/concept-azure-machine-learning-architecture#gpu-inference

https://learn.microsoft.com/en-us/azure/aks/gpu-cluster

https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-online-endpoints?view=azureml-api-2

Ranashekar Guda 2,905 Reputation points Moderator

2025-06-12T16:54:55.4366667+00:00

Hello @Arun Siripuram,

Just checking in to see if the above answer provided by @Amira Bedhiafi helped.

if you have any further query do let us know.
Ranashekar Guda 2,905 Reputation points Moderator

2025-06-13T17:48:34.3866667+00:00

Hello @Arun Siripuram,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet.

Share via

Is Azure Container Apps with D32 (32vCPU, 128GB RAM) suitable for running YOLOv8, PaddleOCR, and BLIP models, or should we consider GPU-based alternatives?

1 answer

Your answer