Share via

Response Api slower than Chat Completion

Sucharda Edward 0 Reputation points
2025-08-18T12:17:44.9133333+00:00

Hi everyone,

I have been comparing the Chat Completion API and the Responses API for the GPT-4o model (API version: 2025-04-01-preview). According to the OpenAI documentation (https://platform.openai.com/docs/guides/migrate-to-responses), the Responses API is recommended. However, in my tests, it consistently shows higher latency.

Testing on an Azure OpenAI resource located in Sweden Central, I observed that the Responses API is usually about twice as slow (1 second vs 0.5 seconds), but sometimes the difference is even greater—up to 9 times slower (2.9 seconds vs 0.3 seconds).

Is this normal? Has it been confirmed that the Responses API is slower than Chat Completion? Or could this be related to the region or current service status?

Thank you for any insights!

Azure OpenAI in Foundry Models
0 comments No comments

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 41,641 Reputation points MVP Volunteer Moderator
    2025-09-01T21:09:38.12+00:00

    Hello Sucharda !

    Thank you for posting on Microsoft Learn Q&A.

    A latency bump with responses is expected, a consistent 2×–9× gap usually isn’t, it's more likely due to how you’re using Responses, the preview API path, or regional load (Sweden Central had recent slow-response reports).

    The endpoint adds orchestration for things like tool use and structured outputs, so there’s a small overhead vs the older Chat Completions path. That’s part of why OpenAI recommends migrating feature superset not because it’s faster per se.

    https://platform.openai.com/docs/guides/migrate-to-responses

    If you’re using structured outputs or json_schema, the first request per schema has extra latency while the schema is processed and cached. Subsequent calls with the same schema should not pay that cost.

    https://openai.com/index/introducing-structured-outputs-in-the-api/

    https://github.com/vercel/ai/discussions/3656

    There were community reports of unusually slow responses for GPT-4.x in Sweden Central around mid August 2025 the same timeframe as your tests.

    https://learn.microsoft.com/en-us/answers/questions/5524170/azure-openai-extremely-slow-response-times-in-swed

    Preview paths can have extra instrumentation and change more frequently. Azure latest GA inference API as of late August 2025 is 2024-10-21, which may be steadier.

    https://learn.microsoft.com/en-us/azure/ai-foundry/openai/api-version-lifecycle

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.