Share via

gpt-image-2 timeout and resolution restraints

Korale 0 Reputation points
2026-05-07T15:22:54.39+00:00

gpt-image-2-1 deployment in Sweden Central: APIM 60s gateway timeout on streaming requests, plus intermittent server_error at ≥2K resolutions

Azure resource details

  • Resource: korale-gpt-image-2-resource
  • Region: Sweden Central
  • Deployment name: gpt-image-2-1 (model: gpt-image-2, public preview)
  • API version tested: 2025-04-01-preview
  • Endpoint pattern: POST /openai/deployments/gpt-image-2-1/images/generations?api-version=2025-04-01-preview
  • Auth: Api-Key
  • Rate limit: 10 RPM (visible in x-ratelimit-limit-requests header)

Summary

We're integrating gpt-image-2 into a production app via this deployment. The model works correctly for small/fast requests but two distinct failure modes appear at the resolutions we need to ship:


Issue 1 — APIM gateway 60s timeout when no SSE events flow within the first minute (100% reproducible)

When we send a streaming request with partial_images: 0 (which is valid per the Foundry docs — "value between 0 and 3"), no SSE events are emitted by the model until the final image_generation.completed event. For 2048×2048 generations this completion takes 200–250s, but the APIM frontend kills the connection at exactly 60s with:

HTTP/2 408
content-type: application/json

{ "error": { "code": "Timeout", "message": "The operation was timeout." } }

Reproduction (verified twice in the last 30 minutes):

curl -X POST \
  "https://korale-gpt-image-2-resource.services.ai.azure.com/openai/deployments/gpt-image-2-1/images/generations?api-version=2025-04-01-preview" \
  -H "Api-Key: ***" -H "Content-Type: application/json" \
  -d '{"prompt":"a red apple on a white background","n":1,"size":"2048x2048","quality":"high","stream":true,"partial_images":0}'

Failing apim-request-id values:

  • defdcbb8-2d25-4206-a857-3bce95628553 (2026-05-07 13:46:12 UTC, 60.19s)
  • 1b3a6d55-0ea9-40ab-8c39-e544dfc86822 (2026-05-07 ~13:50 UTC, 60.26s)

The request only succeeds if we set partial_images >= 1, because the partial-image events keep the gateway connection alive past the 60s threshold. This effectively makes partial_images: 0 a broken parameter combination at large sizes — please either (a) raise APIM's idle timeout for image streaming endpoints to ≥300s, or (b) emit periodic keep-alive comments on the SSE stream so the gateway sees activity.


Issue 2 — Intermittent server_error mid-stream at 2048×2048 (non-deterministic)

With partial_images: 2 keeping the connection warm, the same 2048×2048 request sometimes succeeds (~245s) and sometimes fails after the first or second partial with:

event: error
data: {"type":"error","error":{"type":"server_error","code":null,
       "message":"An error occurred while processing the request.","param":null},
       "sequence_number":2}

We have also seen the more specific variant on the /images/edits endpoint (multipart upload with input images):

data: {"type":"error","error":{"type":"image_generation_server_error",
       "code":"image_generation_failed","message":"Image generation failed",
       "param":null},"sequence_number":0}

These appear to be model-worker crashes. The same prompt at 1024×1024 and the same prompt+size on a retry both succeed, so this isn't a content-policy issue. Could you check internal telemetry for the gpt-image-2-1 deployment in Sweden Central for server_error / image_generation_server_error events around 2026-05-07 13:30–14:00 UTC? Failing apim-request-id available on request — happy to capture more if helpful.


Baseline (working) request, for comparison

curl -X POST '.../images/generations?api-version=2025-04-01-preview' \
  -d '{"prompt":"a red apple on a white background","n":1,"size":"1024x1024","quality":"low"}'

Returns 200 in ~22s. Working apim-request-id: 6a121483-27eb-467f-a616-4d54fd34c9e3.


What we've already ruled out

  • Not auth / deployment-name issue (lower-resolution requests succeed)
  • Not the moderation parameter (works with and without it)
  • Not the model field in body (we don't send it; deployment is in URL path only)
  • Not a content-policy block (trivial prompt; would surface as contentFilter per docs)
  • Not a client-side timeout (curl with no timeout cap; we see the 408 come back from APIM itself)

What we're asking

  1. Confirm the APIM idle timeout for /images/generations and /images/edits SSE endpoints — and either raise it for the gpt-image-2 deployment or emit SSE keep-alives.
  2. Investigate the intermittent server_error / image_generation_server_error at ≥2K resolutions in Sweden Central.
  3. Are there any region-specific gpt-image-2 deployments that are more stable for production traffic at 2K/4K? (We're a startup using Founders Hub credits and would happily redeploy in a different region if recommended.)

Thanks!gpt-image-2-1 deployment in Sweden Central: APIM 60s gateway timeout on streaming requests, plus intermittent server_error at ≥2K resolutions

Azure resource details

  • Resource: korale-gpt-image-2-resource
  • Region: Sweden Central
  • Deployment name: gpt-image-2-1 (model: gpt-image-2, public preview)
  • API version tested: 2025-04-01-preview
  • Endpoint pattern: POST /openai/deployments/gpt-image-2-1/images/generations?api-version=2025-04-01-preview
  • Auth: Api-Key
  • Rate limit: 10 RPM (visible in x-ratelimit-limit-requests header)

Summary

We're integrating gpt-image-2 into a production app via this deployment. The model works correctly for small/fast requests but two distinct failure modes appear at the resolutions we need to ship:


Issue 1 — APIM gateway 60s timeout when no SSE events flow within the first minute (100% reproducible)

When we send a streaming request with partial_images: 0 (which is valid per the Foundry docs — "value between 0 and 3"), no SSE events are emitted by the model until the final image_generation.completed event. For 2048×2048 generations this completion takes 200–250s, but the APIM frontend kills the connection at exactly 60s with:

HTTP/2 408
content-type: application/json

{ "error": { "code": "Timeout", "message": "The operation was timeout." } }

Reproduction (verified twice in the last 30 minutes):

curl -X POST \
  "https://korale-gpt-image-2-resource.services.ai.azure.com/openai/deployments/gpt-image-2-1/images/generations?api-version=2025-04-01-preview" \
  -H "Api-Key: ***" -H "Content-Type: application/json" \
  -d '{"prompt":"a red apple on a white background","n":1,"size":"2048x2048","quality":"high","stream":true,"partial_images":0}'

Failing apim-request-id values:

  • defdcbb8-2d25-4206-a857-3bce95628553 (2026-05-07 13:46:12 UTC, 60.19s)
  • 1b3a6d55-0ea9-40ab-8c39-e544dfc86822 (2026-05-07 ~13:50 UTC, 60.26s)

The request only succeeds if we set partial_images >= 1, because the partial-image events keep the gateway connection alive past the 60s threshold. This effectively makes partial_images: 0 a broken parameter combination at large sizes — please either (a) raise APIM's idle timeout for image streaming endpoints to ≥300s, or (b) emit periodic keep-alive comments on the SSE stream so the gateway sees activity.


Issue 2 — Intermittent server_error mid-stream at 2048×2048 (non-deterministic)

With partial_images: 2 keeping the connection warm, the same 2048×2048 request sometimes succeeds (~245s) and sometimes fails after the first or second partial with:

event: error
data: {"type":"error","error":{"type":"server_error","code":null,
       "message":"An error occurred while processing the request.","param":null},
       "sequence_number":2}

We have also seen the more specific variant on the /images/edits endpoint (multipart upload with input images):

data: {"type":"error","error":{"type":"image_generation_server_error",
       "code":"image_generation_failed","message":"Image generation failed",
       "param":null},"sequence_number":0}

These appear to be model-worker crashes. The same prompt at 1024×1024 and the same prompt+size on a retry both succeed, so this isn't a content-policy issue. Could you check internal telemetry for the gpt-image-2-1 deployment in Sweden Central for server_error / image_generation_server_error events around 2026-05-07 13:30–14:00 UTC? Failing apim-request-id available on request — happy to capture more if helpful.


Baseline (working) request, for comparison

curl -X POST '.../images/generations?api-version=2025-04-01-preview' \
  -d '{"prompt":"a red apple on a white background","n":1,"size":"1024x1024","quality":"low"}'

Returns 200 in ~22s. Working apim-request-id: 6a121483-27eb-467f-a616-4d54fd34c9e3.


What we've already ruled out

  • Not auth / deployment-name issue (lower-resolution requests succeed)
  • Not the moderation parameter (works with and without it)
  • Not the model field in body (we don't send it; deployment is in URL path only)
  • Not a content-policy block (trivial prompt; would surface as contentFilter per docs)
  • Not a client-side timeout (curl with no timeout cap; we see the 408 come back from APIM itself)

What we're asking

  1. Confirm the APIM idle timeout for /images/generations and /images/edits SSE endpoints — and either raise it for the gpt-image-2 deployment or emit SSE keep-alives.
  2. Investigate the intermittent server_error / image_generation_server_error at ≥2K resolutions in Sweden Central.
  3. Are there any region-specific gpt-image-2 deployments that are more stable for production traffic at 2K/4K? (We're a startup using Founders Hub credits and would happily redeploy in a different region if recommended.)

Thanks!

Foundry Models
Foundry Models

A catalog of AI models in Microsoft Foundry that you can discover, compare, and deploy using Azure’s built‑in tools for evaluation, fine‑tuning, and inference


2 answers

Sort by: Most helpful
  1. Karnam Venkata Rajeswari 3,070 Reputation points Microsoft External Staff Moderator
    2026-05-20T18:19:44.5366667+00:00

    Hello @Korale

    Welcome to Microsoft Q&A .Thank you for reaching out to us.

    In addition to the inputs provided by Amira Bedhiafi , please check if the following helps-

    The behavior seen during high‑resolution (2048×2048) image generation using gpt-image-2 can be understood as two connected but distinct aspects—streaming timeout behavior and intermittent backend failures—both influenced by longer processing times at higher resolutions.

    1. Streaming timeout (~60 seconds) when no interim events are emitted The first behavior relates to streaming requests using Server‑Sent Events (SSE). At higher resolutions, total image generation time (~200–250 seconds) significantly exceeds the period during which the connection remains active without data transfer.
      • When stream: true and partial_images: 0, no events are emitted until completion
      • During this idle period (~60 seconds observed), the connection is closed before completion
      • This results in HTTP 408 (Timeout)
      partial_images resolves the issue as When partial_images ≥ 1:
      • Interim SSE events are generated during processing
      • These events keep the connection active
      • The request remains open until completion
      For stable behavior , the recommended approach is to Streaming scenarios use:
         stream: true,
         partial_images: 1
      
      For High-resolution or long-running workloads prefer
         stream: false
      
    2. Intermittent server_error at 2048×2048 At higher resolutions, intermittent failures such as:
      • server_error
      • image_generation_server_error
      may occur during processing. As per the observed pattern where -
      1. Seen only at 2048×2048
      2. Lower resolutions (1024×1024) are stable
      3. Failures are non-deterministic
      4. Retries often succeed
      This behavior is consistent with transient service-side conditions that may occur during long-running, high-compute operations. These are not linked to request structure, authentication or policy constraints. To improve reliability please check if the following help: Retrying strategy
      1. Use exponential backoff
      2. Retry on transient errors
      3. Example intervals: 5s → 15s → 30s
      Resolution optimization
      1. Prefer 1024×1024 or 1536×1536 where feasible
      2. Reserve 2048×2048 for non-interactive scenarios
      Request pattern tuning
      1. Avoid burst traffic at high resolution
      2. Align request volume with observed throughput
      Streaming alignment
      1. Use partial_images ≥ 1 for streaming stability
      2. Use non-streaming mode for long-running operations
    3. Regional validation considerations Service behavior may vary based on regional capacity and workload distribution, particularly for preview features and high-compute workloads. Please check the following-
      1. Test the same workload across supported regions (e.g., West Europe, East US)
      2. Compare:
        • Success rate
        • Latency
        • Error frequency

    In summary ,

    1. Streaming timeout occurs when no SSE events are emitted during long-running operations - mitigated using partial_images ≥ 1 or non-streaming mode
    2. Intermittent server errors at high resolution are likely transient and workload-related - mitigated through retries and request tuning
    3. Regional differences may influence behavior - cross-region testing can help validate consistency

    The following references might be helpful , please check them out

    Configure API for Server-Sent Events in Azure API Management | Microsoft Learn

    Microsoft Foundry Models quotas and limits - Microsoft Foundry | Microsoft Learn

    Please let us know if the response was helpful

     

    Thank you

     

    Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the response was helpful. This will be benefitting other community members who face the same issue.

    Was this answer helpful?

    0 comments No comments

  2. Amira Bedhiafi 41,721 Reputation points MVP Volunteer Moderator
    2026-05-12T20:58:40.6+00:00

    Hello Korale !

    Thank you for posting on MS Learn Q&A.

    For your 1st issue, your observation makes sense. gpt-image-1 series and gpt-image-2 support streaming with stream: true and partial_images can be set between 0 and 3.

    However, when partial_images: 0, no intermediate image event is expected the stream only has the final completion event.

    with partial_images: 0, only the final image is returned.

    So for large or high quality generations that take more than 60 sec, the connection can sit idle until the final event.

    As a prod workaround, I would not use:

    "stream": true,
    "partial_images": 0
    

    for 2K or higher image generations you can use at least:

    "stream": true,
    "partial_images": 1
    

    or 2/3 if you want more frequent progress events. That keeps the SSE connection active and avoids the no event for the first minute case.

    For your 2nd issue, 2048x2048 should be inside the published GPT image 2 resolution constraints where both edges are multiples of 16 and the max edge is below 3840 px, the aspect ratio is not greater than 3:1 and total pixels do not exceed 8,294,400.

    A 2048×2048 request is around 4.19M pixels so it should not be rejected purely on resolution.

    You need to contact Azure support in this case.

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.