The behavior described (deployment succeeds, but public access times out with 504) aligns with typical Azure App Service issues around app startup, health, or outbound connectivity rather than a pure code or database problem.
A focused checklist based on Azure App Service behavior:
- App Service basics and monitoring
- In the Azure portal, open the App Service and check Monitoring → Metrics for:
- Requests
- CPU time
- Memory working set
- Average memory working set
If requests are reaching the app but responses are slow or failing, these metrics will show high CPU, memory, or long-running requests.
- Also verify overall platform health via Azure Service Health to rule out platform incidents.
- 502/503/504-style behavior and app startup
504 from a front-end (Application Gateway, browser) often maps to the app not responding in time or at all. For App Service, similar symptoms are documented for 502/503:
- Common causes:
- Requests taking a long time.
- High memory or CPU usage.
- Exceptions preventing the app from responding.
- Recommended approach:
- Observe and monitor app behavior (metrics as above).
- Collect data (logs, failed request traces, app logs).
- Mitigate (optimize code, increase plan size, fix exceptions).
For a Java/Payara app, slow startup or long first-request initialization can cause the front-end to time out, especially if health probes or warm-up pings are not configured correctly.
- Health checks, warm-up, and startup behavior
Azure App Service can use warm-up pings and health checks to decide when to send organic traffic to an instance:
-
WEBSITE_WARMUP_PATH (not shown in context but implied by WEBSITE_WARMUP_STATUSES) and WEBSITE_WARMUP_STATUSES control how the platform judges a container as ready.
-
WEBSITE_WARMUP_STATUSES is a comma-delimited list of HTTP status codes that are considered successful when the platform makes warm-up pings against a newly started container.
- Example:
200,202.
- If the warm-up path returns a status not in this list (for example 502), the platform keeps pinging until a valid status is returned or the container startup timeout is reached.
- If the app never returns a valid status, the platform fails startup and retries, which can surface as 503/504-like behavior to callers.
For Payara:
- Ensure the warm-up/health endpoint is:
- Fast (no heavy DB or initialization logic).
- Returns an allowed status code (200/202) when the app is ready.
- If using Application Gateway or another probe, align its probe path and timeout with the App Service warm-up/health behavior so that the gateway does not time out before the app is considered ready.
- Outbound connectivity and external PostgreSQL (Aiven)
The context describes intermittent outbound connection issues in App Service, often due to SNAT port exhaustion:
- Symptoms include:
- Slow response times.
- Intermittent 5xx / Bad Gateway.
- Time-out errors.
- Inability to connect to external endpoints (for example SQL DB or other services).
For an external PostgreSQL (Aiven):
- Ensure proper connection pooling in the Java stack (JDBC pool such as HikariCP, Tomcat pool, etc.).
- Avoid opening a new DB connection per request; reuse pooled connections to reduce outbound connection churn and SNAT usage.
- If the app uses HTTP-based dependencies as well, pool HTTP connections as recommended (for .NET there is HttpClientFactory; for Java, use pooled HttpClient implementations).
If SNAT exhaustion or outbound connectivity issues occur, the app may hang on DB calls, leading to timeouts and 504s at the front-end.
- Logs and diagnostics to distinguish app vs. platform issues
To determine whether the issue is in the app or the platform:
- Enable diagnostics logs for the web app:
- In the portal: Diagnostics Logs → enable Failed Request Tracing.
- Download failed request traces via Kudu (Tools → Kudu → Debug Console → LogFiles → W3SVC*).
- Use these traces to see:
- Whether requests reach the app.
- Where they stall (for example, during DB calls, during WebSocket upgrade, or during app startup).
This will confirm whether the 504 is due to:
- App not starting or throwing exceptions early.
- Long-running requests (for example, DB connectivity or slow initialization).
- Platform-level issues (less common, but visible via service health and metrics).
- WebSockets-specific considerations
The context does not list special WebSocket configuration beyond the general App Service environment variables and metrics. In general for App Service:
- WebSockets must be enabled in the App Service configuration.
- There is a
WEBSOCKET_CONCURRENT_REQUEST_LIMIT (read-only) that indicates the limit for concurrent WebSocket requests; for Standard tier and higher, it is -1 but still subject to per-VM limits.
Checklist for WebSockets:
- Confirm WebSockets are enabled in the App Service configuration.
- Ensure the Application Gateway (if used) is configured to support WebSockets and not prematurely time out idle connections.
- Verify that the app responds quickly to the initial HTTP upgrade request; slow or blocked upgrade can manifest as timeouts.
- App Service vs. Container Apps / other services
The context does not directly compare App Service vs. Container Apps, but it does highlight that:
- App Service is sensitive to application-level issues (startup time, CPU/memory, exceptions) that surface as 502/503/504.
- For more control over container startup, health checks, and networking, container-based services (for example, running Payara in a container with explicit health probes) can sometimes be easier to tune.
Given a Java + Payara + WebSocket + external DB workload:
- App Service is viable if:
- Startup time is kept within platform limits.
- Health/warm-up endpoints are configured correctly.
- Outbound DB connectivity is pooled and stable.
- If very fine-grained control over container lifecycle, probes, and networking is required, container-focused services might be more flexible.
- Direct answers to the numbered questions
- Settings to verify first for Payara + WebSockets on App Service
- WebSockets enabled in App Service.
- Health/warm-up path returns 200/202 quickly.
- App Service metrics (CPU, memory, requests) to detect slow or failing app behavior.
- Diagnostics logs and failed request tracing to see where requests fail.
- Health check / startup probe as cause
- Yes. Misconfigured or slow health/warm-up endpoints can cause the platform to repeatedly restart or never route traffic, resulting in 503/504-like behavior.
- Ensure
WEBSITE_WARMUP_STATUSES-compatible responses and a fast warm-up path.
- Special configuration for WebSockets beyond enabling them
- From the provided context, no additional App Service settings are required beyond enabling WebSockets and respecting the concurrent request limits.
- Ensure any front-end (Application Gateway) is configured to support WebSockets and appropriate timeouts.
- Common outbound connectivity / SSL issues to external PostgreSQL
- The main documented class of issues is intermittent outbound connection errors due to SNAT port exhaustion or poor connection management.
- Use proper JDBC connection pooling and avoid excessive connection churn.
- Port/startup timeout/recycling explaining successful deployment but failed public access
- Yes. If the app takes too long to start, fails health checks, or crashes, the platform can keep recycling it. Deployment will show as successful, but public requests will time out or see 5xx until the app is healthy.
- Is App Service the right choice?
- App Service is suitable if the app is tuned for its startup and health model and outbound connectivity is managed properly.
- For more control over container behavior and probes, container-centric services may be preferable, but that trade-off is not explicitly covered in the context.
References: