Azure App Service is a service used to create and deploy scalable, mission-critical web apps.
Based on the provided information, the behavior is consistent with an outbound connectivity problem from Azure App Service rather than with authentication or the Azure Python SDK itself.
For apps and functions hosted on Azure App Service, intermittent or sudden failures when calling external endpoints (including Azure Resource Manager) are typically caused by hitting outbound connection limits:
- SNAT port exhaustion to management.azure.com
- Each App Service instance initially has a preallocated quota of 128 SNAT ports for outbound connections to a given address/port combination.
- When applications rapidly open new outbound connections to the same destination (for example, repeated calls to
https://management.azure.com), the SNAT ports for that destination can be exhausted. - Once exhausted, new outbound connections can intermittently fail until SNAT ports are reclaimed (the Azure load balancer waits about four minutes after connection close before reclaiming a port).
- Symptoms of SNAT exhaustion include intermittent 5xx/Bad Gateway errors, timeouts, and failures to connect to external endpoints, which matches the described impact on the App Service apps.
- How to confirm whether SNAT is the cause
Use App Service Diagnostics to inspect SNAT port allocation for the affected apps:- In the Azure portal, open each App Service.
- Select Diagnose and solve problems.
- Choose Availability and Performance.
- Select the SNAT Port Exhaustion tile.
This view shows SNAT port allocation information for the site. The recommended practice is to keep usage below 128. If more detailed data is required, open a support ticket so support can pull backend metrics.
- Distinguish SNAT vs. total TCP connections
- SNAT ports are used only for external flows (such as calls to
management.azure.com). - Total TCP connections also include local/loopback connections and are tracked separately.
- The TCP connections limit is per worker instance and is not used directly for SNAT limiting.
A separate detector for TCP connections is also available under Diagnose and solve problems by searching for “TCP connections.”
- SNAT ports are used only for external flows (such as calls to
- Mitigation and long‑term fixes
To avoid or reduce SNAT port exhaustion for outbound calls from App Service:- Use connection pooling in the application code
Ensure HTTP connections to Azure Resource Manager are pooled and reused instead of opening new connections for each request. For HTTP-based calls, follow the guidance to pool HTTP connections (for example, usingHttpClientFactoryin .NET; the same principle applies in other languages). This reduces the rate at which new SNAT ports are consumed. - Use service endpoints or private endpoints where applicable
For Azure services that support service endpoints or private endpoints, integrate the App Service with a regional virtual network and configure service endpoints or private endpoints. Outbound traffic from the app to those services then bypasses SNAT limits. - Use a NAT gateway for external endpoints
For destinations that must be reached over the public internet (such asmanagement.azure.com), integrate the App Service with a virtual network and attach a NAT gateway to the integration subnet. A NAT gateway provides up to 64K outbound SNAT ports and a dedicated outbound IP address for traffic from that subnet. - Combine strategies
The most robust approach is to combine these mitigations: use connection pooling in the application, service/private endpoints for Azure services that support them, and a NAT gateway for remaining external endpoints.
- Use connection pooling in the application code
- Next steps for this specific scenario
- Use App Service Diagnostics → SNAT Port Exhaustion to confirm whether SNAT usage is high at the time of failures.
- If SNAT exhaustion is confirmed, implement connection pooling in the Django application’s HTTP client used by the Azure Python SDK (or ensure the SDK is configured to reuse connections).
- If code changes are not immediately possible, consider moving the apps to a plan with regional VNet integration and a NAT gateway to increase available SNAT ports and change the outbound path.
- If diagnostics do not show SNAT pressure yet the issue persists, open an Azure support ticket, referencing the time range and the error details, so support can correlate the Ref A/B/C identifiers with backend logs.
References: