How to resolve intermittent 503 errors with high CPU utilization in Azure App Service during peak traffic spikes while maintaining SLA?
We have a .NET Core 6 API hosted on Azure App Service (Premium P3v2 tier) that experiences intermittent HTTP 503 errors during unpredictable traffic spikes (10x baseline). Application Insights shows CPU saturation (~95%) coinciding with the errors, but auto-scaling (configured on CPU > 70%) often lags behind demand.
Current Configuration:
- Instances: 3 (min), 10 (max)
- Scale-out: CPU > 70% for 5 minutes, +1 instance
- ARR Affinity: Disabled
- Health Check: /status (200 OK endpoint)
- Database: Azure SQL (DTU 100, no throttling observed)
Attempted Fixes (No Success):
- Pre-warmed instances via Always On + startup tasks.
- Adjusted scale-out rules to trigger at 60% CPU with 3-minute cooldown (resulted in over-provisioning without eliminating 503s).
- Optimized code (reduced EF Core queries, added caching via Redis).
Hard Requirements:
- Must maintain 99.95% SLA.
- Cannot use ASE (App Service Environment) due to cost constraints.
Question:
What’s a deterministic strategy to eliminate 503s under these conditions? Are there hidden Azure quotas (e.g., SNAT, VMSS burst limits) or advanced scaling patterns (predictive, queue-based) that could resolve this? Provide low-latency solutions, not theoretical guidance.
View Markdown
# Intermittent 503s in Azure App Service During Traffic Spikes
Problem:
- 503 errors during sudden 10x traffic spikes.
- CPU hits ~95% before scaling kicks in (P3v2 tier).
- Auto-scaling delay causes SLA risk.
Constraints:
- No ASE, must stay cost-efficient.
- 99.95% SLA non-negotiable.
Need:
- Actionable fixes (e.g., ARM template tweaks, scale rule hacks).
- Deep Azure infra insights (throttling, SNAT, VMSS quirks).