Share via

Switching from `dall-e-3` to `gpt-image-1-mini` IS SLOW đź’Ą

Mike-E-angelo 631 Reputation points
2026-04-27T18:26:42.6766667+00:00

I switched over from dall-e-3 to gpt-image-1-mini and what used to take 30-40 seconds to generate an image is now taking over two minutes. Why did you deprecate a model only to replace it with another model that is twice as slow? I thought the -mini was supposed to be a smaller/faster model?

I hope you can understand the concern. How do I use this "new" model so that it just as fast if not faster than the previous model?

Thank you for any assistance.,

Azure OpenAI in Foundry Models

4 answers

Sort by: Most helpful
  1. Mike-E-angelo 631 Reputation points
    2026-04-28T11:08:29.85+00:00

    After spending a day on this disaster, I settled on gpt-image-1.5 on low quality. I had to greatly modify the prompts over Dall-e 3 which "just got it".

    dall-e-3: album cover art, epic design without any words

    gpt-image-1.5: randomized album cover art, dalle-3 vibes, professional compelling design, high quality and composition, awe-inspiring, imaginative and inspirational, little to no human presence, no words, no text, no legible letters

    GPT image INSISTS on putting a person/human in the image with each and every render! Dall-e 3 was much more chill about this, only 10% of the time? I had to modify to omit this. Such a pain! Well, I guess that pain is over, and I can move on to the next fire.

    Was this answer helpful?

    0 comments No comments

  2. Mike-E-angelo 631 Reputation points
    2026-04-28T06:29:38.67+00:00

    The image quality of gpt-image-1-mini as compared to dall-e-3 is remarkably different and vastly inferior! You have greatly impacted the quality of my application with this most questionable decision to deprecate a perfectly functional and high quality model!

    Why are you destroying our applications?

    Was this answer helpful?

    0 comments No comments

  3. Karnam Venkata Rajeswari 3,240 Reputation points Microsoft External Staff Moderator
    2026-04-27T18:33:34.48+00:00

    Hello @Mike-E-angelo ,

    Welcome to Microsoft Q&A .Thank you for reaching out to us.

    Thank you for sharing the detailed observation regarding increased latency after moving from dall-e-3 to gpt-image-1-mini. The behavior being observed is understandable and aligns with how different image generation models are designed and optimized.

    The key clarification is that gpt-image-1-mini is not architected as a direct latency-equivalent replacement for dall-e-3. While the “mini” variant focuses on cost efficiency and scalable throughput, this does not necessarily translate to lower per-request response time. In practice, performance characteristics vary depending on how the workload is structured, including request size, output configuration, and concurrency. In Azure OpenAI, both latency (per-call response time) and throughput (overall system capacity) are influenced by workload shape and deployment conditions

    Generally, a transition between models may result in differences in generation time even when performing similar tasks.

    Key factors contributing to increased latency - The following conditions commonly influence image generation time

    1. Higher image resolution (for example, 1024Ă—1024) requiring more processing time
    2. Generating multiple images per request (n > 1) increasing total compute workload
    3. Complex or detailed prompts that require additional processing
    4. Operation under shared (pay-as-you-go) capacity where system load may introduce queueing delays

    To improve response time, please check if the following help -

    1. Optimizing output size and generation parameters
      1. Please try reducing image resolution (e.g., from 1024Ă—1024 to 512Ă—512)
      2. Limiting the number of images generated per request
    2. Refining workload characteristics by
      1. Simplifying prompts where possible to reduce processing complexity
      2. Distributing requests more evenly to avoid peak usage spikes
    3. Improving throughput handling where applicable
      1. Using parallel requests carefully to improve total output rate
      2. Please note that this improves overall throughput rather than individual latency
    4. Evaluating deployment type for stable performance
      1. For workloads requiring consistent latency, Provisioned Throughput deployments can be considered
      2. These deployments allocate dedicated model capacity and provide predictable latency and throughput behavior

    With the above optimizations, observable latency improvements are often achievable. However, depending on workload characteristics, generation time may still differ from dall-e-3, as each model is optimized with different performance tradeoffs—typically cost efficiency and scalability versus lowest per-request latency.

    The following references might be helpful , please check them out

    Thank you

    Was this answer helpful?

    0 comments No comments

  4. AI answer

    2026-04-27T18:26:57.9666667+00:00

    gpt-image-1-mini is optimized for cost efficiency and faster generation at scale, but latency depends heavily on how it is used and on the request parameters, not just on the model name.

    From the documented capabilities:

    • gpt-image-1-mini is described as “Cost-efficient and faster for large-scale or iterative generation”, but it:
      • Only supports fixed resolutions: 1024Ă—1024, 1024Ă—1536, 1536Ă—1024.
      • Uses the same low / medium / high quality options as other image models, with default = medium.
    • Larger models (like GPT-Image-1 / 1.5 / 2) are high‑fidelity, realism‑optimized and generally higher latency and cost.

    To get the best performance from gpt-image-1-mini and approach or beat previous dall-e-3 latency, adjust usage as follows:

    1. Use the smallest acceptable resolution
      Stick to 1024Ă—1024 where possible. Higher resolutions (1024Ă—1536 or 1536Ă—1024) increase generation time.
    2. Lower the quality setting when acceptable
      • Set quality to low or keep it at medium instead of high.
      • low is optimized for latency-sensitive use cases on GPT-Image-2 and the same pattern applies conceptually: higher quality generally means more latency.
    3. Limit the number of images per request
      • All image models support 1–10 images per request via the n parameter.
      • For best latency, keep n = 1 when only one image is needed.
    4. Avoid unnecessary editing/inpainting passes
      • All models (including gpt-image-1-mini) support inpainting/variations with a mask. Each edit is a separate generation step and adds latency.
      • If possible, generate closer to the final result in one pass instead of multiple edit calls.
    5. Check for external bottlenecks
      Since the model itself is designed to be cost‑efficient and faster for large‑scale or iterative workloads, a jump from ~30–40 seconds to >2 minutes often indicates:
      • Network latency or throttling between the application and the service.
      • Client-side processing (for example, downloading large images, base64 decoding, or saving to slow storage).
      • Increased concurrency or queueing on the client or server side.
    6. Use streaming only if it improves perceived latency
      • For edit scenarios, the image edit API supports a stream parameter that returns partial images as they are generated and a partial_images parameter (1–3).
      • Streaming does not always reduce total wall‑clock time, but it can significantly improve perceived responsiveness.

    If gpt-image-1-mini is still consistently slower than the previous dall-e-3 usage even after:

    • Using 1024Ă—1024 resolution,
    • Keeping n = 1, and
    • Using medium or low quality,

    then the main cause is likely outside the model (network, client processing, or environment changes) rather than the -mini architecture itself.


    References:

    AI-generated content may be incorrect. Read our transparency notes for more information.

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.