Switching from `dall-e-3` to `gpt-image-1-mini` IS SLOW 💥

Question

Switching from `dall-e-3` to `gpt-image-1-mini` IS SLOW 💥

Mike-E-angelo 631

I switched over from dall-e-3 to gpt-image-1-mini and what used to take 30-40 seconds to generate an image is now taking over two minutes. Why did you deprecate a model only to replace it with another model that is twice as slow? I thought the -mini was supposed to be a smaller/faster model?

I hope you can understand the concern. How do I use this "new" model so that it just as fast if not faster than the previous model?

Thank you for any assistance.,

Mike-E-angelo 631 Reputation points

2026-04-27T19:04:18.7066667+00:00

Thank you, Karnam, for your reply. It is appreciated. I can confirm that after trying it a few times, it seems to clock in at around 25 seconds. Still seems slow for a "mini" model. However, the much bigger problem at the moment is that the quality of the images is nowhere near what Dall-e-3 had, and it makes no sense to me that you would retire one great/fast model for one that is slower and of lower quality. I hope you can understand the concern.

4 answers

Your answer

Mike-E-angelo 631 Reputation points

2026-04-27T19:04:18.7066667+00:00

Thank you, Karnam, for your reply. It is appreciated. I can confirm that after trying it a few times, it seems to clock in at around 25 seconds. Still seems slow for a "mini" model. However, the much bigger problem at the moment is that the quality of the images is nowhere near what Dall-e-3 had, and it makes no sense to me that you would retire one great/fast model for one that is slower and of lower quality. I hope you can understand the concern.

Answer 1

After spending a day on this disaster, I settled on gpt-image-1.5 on low quality. I had to greatly modify the prompts over Dall-e 3 which "just got it".

dall-e-3: album cover art, epic design without any words

gpt-image-1.5: randomized album cover art, dalle-3 vibes, professional compelling design, high quality and composition, awe-inspiring, imaginative and inspirational, little to no human presence, no words, no text, no legible letters

GPT image INSISTS on putting a person/human in the image with each and every render! Dall-e 3 was much more chill about this, only 10% of the time? I had to modify to omit this. Such a pain! Well, I guess that pain is over, and I can move on to the next fire.

Answer 2

Mike-E-angelo 631

The image quality of gpt-image-1-mini as compared to dall-e-3 is remarkably different and vastly inferior! You have greatly impacted the quality of my application with this most questionable decision to deprecate a perfectly functional and high quality model!

Why are you destroying our applications?

0 comments

Answer 3

Hello @Mike-E-angelo ,

Welcome to Microsoft Q&A .Thank you for reaching out to us.

Thank you for sharing the detailed observation regarding increased latency after moving from dall-e-3 to gpt-image-1-mini. The behavior being observed is understandable and aligns with how different image generation models are designed and optimized.

The key clarification is that gpt-image-1-mini is not architected as a direct latency-equivalent replacement for dall-e-3. While the “mini” variant focuses on cost efficiency and scalable throughput, this does not necessarily translate to lower per-request response time. In practice, performance characteristics vary depending on how the workload is structured, including request size, output configuration, and concurrency. In Azure OpenAI, both latency (per-call response time) and throughput (overall system capacity) are influenced by workload shape and deployment conditions

Generally, a transition between models may result in differences in generation time even when performing similar tasks.

Key factors contributing to increased latency - The following conditions commonly influence image generation time

Higher image resolution (for example, 1024×1024) requiring more processing time
Generating multiple images per request (n > 1) increasing total compute workload
Complex or detailed prompts that require additional processing
Operation under shared (pay-as-you-go) capacity where system load may introduce queueing delays

To improve response time, please check if the following help -

Optimizing output size and generation parameters
1. Please try reducing image resolution (e.g., from 1024×1024 to 512×512)
2. Limiting the number of images generated per request
Refining workload characteristics by
1. Simplifying prompts where possible to reduce processing complexity
2. Distributing requests more evenly to avoid peak usage spikes
Improving throughput handling where applicable
1. Using parallel requests carefully to improve total output rate
2. Please note that this improves overall throughput rather than individual latency
Evaluating deployment type for stable performance
1. For workloads requiring consistent latency, Provisioned Throughput deployments can be considered
2. These deployments allocate dedicated model capacity and provide predictable latency and throughput behavior

With the above optimizations, observable latency improvements are often achievable. However, depending on workload characteristics, generation time may still differ from dall-e-3, as each model is optimized with different performance tradeoffs—typically cost efficiency and scalability versus lowest per-request latency.

The following references might be helpful , please check them out

Azure OpenAI in Microsoft Foundry Models performance & latency - Microsoft Foundry | Microsoft Learn

Thank you

Answer 4

gpt-image-1-mini is optimized for cost efficiency and faster generation at scale, but latency depends heavily on how it is used and on the request parameters, not just on the model name.

From the documented capabilities:

gpt-image-1-mini is described as “Cost-efficient and faster for large-scale or iterative generation”, but it:
- Only supports fixed resolutions: 1024×1024, 1024×1536, 1536×1024.
- Uses the same low / medium / high quality options as other image models, with default = medium.
Larger models (like GPT-Image-1 / 1.5 / 2) are high‑fidelity, realism‑optimized and generally higher latency and cost.

To get the best performance from gpt-image-1-mini and approach or beat previous dall-e-3 latency, adjust usage as follows:

Use the smallest acceptable resolution
Stick to 1024×1024 where possible. Higher resolutions (1024×1536 or 1536×1024) increase generation time.
Lower the quality setting when acceptable
- Set quality to low or keep it at medium instead of high.
- low is optimized for latency-sensitive use cases on GPT-Image-2 and the same pattern applies conceptually: higher quality generally means more latency.
Limit the number of images per request
- All image models support 1–10 images per request via the n parameter.
- For best latency, keep n = 1 when only one image is needed.
Avoid unnecessary editing/inpainting passes
- All models (including gpt-image-1-mini) support inpainting/variations with a mask. Each edit is a separate generation step and adds latency.
- If possible, generate closer to the final result in one pass instead of multiple edit calls.
Check for external bottlenecks
Since the model itself is designed to be cost‑efficient and faster for large‑scale or iterative workloads, a jump from ~30–40 seconds to >2 minutes often indicates:
- Network latency or throttling between the application and the service.
- Client-side processing (for example, downloading large images, base64 decoding, or saving to slow storage).
- Increased concurrency or queueing on the client or server side.
Use streaming only if it improves perceived latency
- For edit scenarios, the image edit API supports a stream parameter that returns partial images as they are generated and a partial_images parameter (1–3).
- Streaming does not always reduce total wall‑clock time, but it can significantly improve perceived responsiveness.

If gpt-image-1-mini is still consistently slower than the previous dall-e-3 usage even after:

Using 1024×1024 resolution,
Keeping n = 1, and
Using medium or low quality,

then the main cause is likely outside the model (network, client processing, or environment changes) rather than the -mini architecture itself.

References:

Share via

Switching from `dall-e-3` to `gpt-image-1-mini` IS SLOW 💥

4 answers

Your answer