gpt-image-2-1 deployment in Sweden Central: APIM 60s gateway timeout on streaming requests, plus intermittent server_error at ≥2K resolutions
Azure resource details
- Resource: korale-gpt-image-2-resource
- Region: Sweden Central
- Deployment name: gpt-image-2-1 (model: gpt-image-2, public preview)
- API version tested: 2025-04-01-preview
- Endpoint pattern: POST /openai/deployments/gpt-image-2-1/images/generations?api-version=2025-04-01-preview
- Auth: Api-Key
- Rate limit: 10 RPM (visible in x-ratelimit-limit-requests header)
Summary
We're integrating gpt-image-2 into a production app via this deployment. The model works correctly for small/fast requests but two distinct failure modes appear at the resolutions we need to ship:
Issue 1 — APIM gateway 60s timeout when no SSE events flow within the first minute (100% reproducible)
When we send a streaming request with partial_images: 0 (which is valid per the Foundry docs — "value between 0 and 3"), no SSE events are emitted by the model until the final image_generation.completed event. For 2048×2048 generations this completion takes 200–250s, but the APIM frontend kills the connection at exactly 60s with:
HTTP/2 408
content-type: application/json
{ "error": { "code": "Timeout", "message": "The operation was timeout." } }
Reproduction (verified twice in the last 30 minutes):
curl -X POST \
"https://korale-gpt-image-2-resource.services.ai.azure.com/openai/deployments/gpt-image-2-1/images/generations?api-version=2025-04-01-preview" \
-H "Api-Key: ***" -H "Content-Type: application/json" \
-d '{"prompt":"a red apple on a white background","n":1,"size":"2048x2048","quality":"high","stream":true,"partial_images":0}'
Failing apim-request-id values:
- defdcbb8-2d25-4206-a857-3bce95628553 (2026-05-07 13:46:12 UTC, 60.19s)
- 1b3a6d55-0ea9-40ab-8c39-e544dfc86822 (2026-05-07 ~13:50 UTC, 60.26s)
The request only succeeds if we set partial_images >= 1, because the partial-image events keep the gateway connection alive past the 60s threshold. This effectively makes partial_images: 0 a broken parameter combination at large sizes — please either (a) raise APIM's idle timeout for image streaming endpoints to ≥300s, or (b) emit periodic keep-alive comments on the SSE stream so the gateway sees activity.
Issue 2 — Intermittent server_error mid-stream at 2048×2048 (non-deterministic)
With partial_images: 2 keeping the connection warm, the same 2048×2048 request sometimes succeeds (~245s) and sometimes fails after the first or second partial with:
event: error
data: {"type":"error","error":{"type":"server_error","code":null,
"message":"An error occurred while processing the request.","param":null},
"sequence_number":2}
We have also seen the more specific variant on the /images/edits endpoint (multipart upload with input images):
data: {"type":"error","error":{"type":"image_generation_server_error",
"code":"image_generation_failed","message":"Image generation failed",
"param":null},"sequence_number":0}
These appear to be model-worker crashes. The same prompt at 1024×1024 and the same prompt+size on a retry both succeed, so this isn't a content-policy issue. Could you check internal telemetry for the gpt-image-2-1 deployment in Sweden Central for server_error / image_generation_server_error events around 2026-05-07 13:30–14:00 UTC? Failing apim-request-id available on request — happy to capture more if helpful.
Baseline (working) request, for comparison
curl -X POST '.../images/generations?api-version=2025-04-01-preview' \
-d '{"prompt":"a red apple on a white background","n":1,"size":"1024x1024","quality":"low"}'
Returns 200 in ~22s. Working apim-request-id: 6a121483-27eb-467f-a616-4d54fd34c9e3.
What we've already ruled out
- Not auth / deployment-name issue (lower-resolution requests succeed)
- Not the moderation parameter (works with and without it)
- Not the model field in body (we don't send it; deployment is in URL path only)
- Not a content-policy block (trivial prompt; would surface as contentFilter per docs)
- Not a client-side timeout (curl with no timeout cap; we see the 408 come back from APIM itself)
What we're asking
- Confirm the APIM idle timeout for /images/generations and /images/edits SSE endpoints — and either raise it for the gpt-image-2 deployment or emit SSE keep-alives.
- Investigate the intermittent server_error / image_generation_server_error at ≥2K resolutions in Sweden Central.
- Are there any region-specific gpt-image-2 deployments that are more stable for production traffic at 2K/4K? (We're a startup using Founders Hub credits and would happily redeploy in a different region if recommended.)
Thanks!gpt-image-2-1 deployment in Sweden Central: APIM 60s gateway timeout on streaming requests, plus intermittent server_error at ≥2K resolutions
Azure resource details
- Resource: korale-gpt-image-2-resource
- Region: Sweden Central
- Deployment name: gpt-image-2-1 (model: gpt-image-2, public preview)
- API version tested: 2025-04-01-preview
- Endpoint pattern: POST /openai/deployments/gpt-image-2-1/images/generations?api-version=2025-04-01-preview
- Auth: Api-Key
- Rate limit: 10 RPM (visible in x-ratelimit-limit-requests header)
Summary
We're integrating gpt-image-2 into a production app via this deployment. The model works correctly for small/fast requests but two distinct failure modes appear at the resolutions we need to ship:
Issue 1 — APIM gateway 60s timeout when no SSE events flow within the first minute (100% reproducible)
When we send a streaming request with partial_images: 0 (which is valid per the Foundry docs — "value between 0 and 3"), no SSE events are emitted by the model until the final image_generation.completed event. For 2048×2048 generations this completion takes 200–250s, but the APIM frontend kills the connection at exactly 60s with:
HTTP/2 408
content-type: application/json
{ "error": { "code": "Timeout", "message": "The operation was timeout." } }
Reproduction (verified twice in the last 30 minutes):
curl -X POST \
"https://korale-gpt-image-2-resource.services.ai.azure.com/openai/deployments/gpt-image-2-1/images/generations?api-version=2025-04-01-preview" \
-H "Api-Key: ***" -H "Content-Type: application/json" \
-d '{"prompt":"a red apple on a white background","n":1,"size":"2048x2048","quality":"high","stream":true,"partial_images":0}'
Failing apim-request-id values:
- defdcbb8-2d25-4206-a857-3bce95628553 (2026-05-07 13:46:12 UTC, 60.19s)
- 1b3a6d55-0ea9-40ab-8c39-e544dfc86822 (2026-05-07 ~13:50 UTC, 60.26s)
The request only succeeds if we set partial_images >= 1, because the partial-image events keep the gateway connection alive past the 60s threshold. This effectively makes partial_images: 0 a broken parameter combination at large sizes — please either (a) raise APIM's idle timeout for image streaming endpoints to ≥300s, or (b) emit periodic keep-alive comments on the SSE stream so the gateway sees activity.
Issue 2 — Intermittent server_error mid-stream at 2048×2048 (non-deterministic)
With partial_images: 2 keeping the connection warm, the same 2048×2048 request sometimes succeeds (~245s) and sometimes fails after the first or second partial with:
event: error
data: {"type":"error","error":{"type":"server_error","code":null,
"message":"An error occurred while processing the request.","param":null},
"sequence_number":2}
We have also seen the more specific variant on the /images/edits endpoint (multipart upload with input images):
data: {"type":"error","error":{"type":"image_generation_server_error",
"code":"image_generation_failed","message":"Image generation failed",
"param":null},"sequence_number":0}
These appear to be model-worker crashes. The same prompt at 1024×1024 and the same prompt+size on a retry both succeed, so this isn't a content-policy issue. Could you check internal telemetry for the gpt-image-2-1 deployment in Sweden Central for server_error / image_generation_server_error events around 2026-05-07 13:30–14:00 UTC? Failing apim-request-id available on request — happy to capture more if helpful.
Baseline (working) request, for comparison
curl -X POST '.../images/generations?api-version=2025-04-01-preview' \
-d '{"prompt":"a red apple on a white background","n":1,"size":"1024x1024","quality":"low"}'
Returns 200 in ~22s. Working apim-request-id: 6a121483-27eb-467f-a616-4d54fd34c9e3.
What we've already ruled out
- Not auth / deployment-name issue (lower-resolution requests succeed)
- Not the moderation parameter (works with and without it)
- Not the model field in body (we don't send it; deployment is in URL path only)
- Not a content-policy block (trivial prompt; would surface as contentFilter per docs)
- Not a client-side timeout (curl with no timeout cap; we see the 408 come back from APIM itself)
What we're asking
- Confirm the APIM idle timeout for /images/generations and /images/edits SSE endpoints — and either raise it for the gpt-image-2 deployment or emit SSE keep-alives.
- Investigate the intermittent server_error / image_generation_server_error at ≥2K resolutions in Sweden Central.
- Are there any region-specific gpt-image-2 deployments that are more stable for production traffic at 2K/4K? (We're a startup using Founders Hub credits and would happily redeploy in a different region if recommended.)
Thanks!