gpt-5.1 Responses API - Very slow streaming with file/image inputs

Manuel Stapff 0 Reputation points
2025-11-28T16:44:38.7033333+00:00

Hi everyone,

I'm experiencing significant performance issues with gpt-5.1 in the Responses API when processing inputs that contain files or images. The streaming response speed becomes extremely slow, to the point of being completely unusable for any user-facing application.

Issue Details:

  • Model: gpt-5.1
  • API: Responses API
  • Region: East US 2
  • Reproduction: 100% consistent

Behavior:

  • Extremely slow streaming when gpt-5.1 processes any file input (PDF) or image input
  • Occurs with all three input methods: base64, URL, and data file
  • Occurs regardless of reasoning effort setting (low, medium, high)

What works normally:

  • gpt-5 with same file/image inputs → normal speed
  • gpt-5.1-chat with file/image inputs → normal speed
  • gpt-5.1 in Completions API → normal speed

Since we rely on the Responses API for hosted tools (File Search, Code Interpreter), we can't easily switch to Completions API.

Has anyone else experienced this? Any known workarounds?

Thanks!

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
{count} votes

1 answer

Sort by: Most helpful
  1. Q&A Assist
    2025-11-28T16:44:49.4566667+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    Performance issues with the gpt-5.1 model in the Responses API, particularly with file and image inputs, can be attributed to several factors. While specific details about the slow streaming performance with gpt-5.1 are not explicitly covered in the provided context, it is noted that different models and APIs can exhibit varying performance characteristics.

    Here are some considerations that may help:

    1. Model Latency: The performance of different models can vary significantly. If gpt-5.1 is consistently slow with file/image inputs, it may be beneficial to evaluate if other models (like gpt-5) or APIs (like the Completions API) can meet your needs, as you mentioned they perform normally.
    2. Input Size and Complexity: Large files or complex images can inherently slow down processing times. If possible, try to optimize the size of the inputs or simplify the content being processed.
    3. Streaming: Enabling streaming can sometimes help manage user expectations by providing partial results as they are generated, but it may not resolve underlying latency issues.
    4. Content Filtering: If content filtering is enabled, it can impact response times. Evaluate if your workloads could benefit from modified content filtering policies.
    5. Region-Specific Performance: Since you are using the East US 2 region, it may be worth checking if there are any known issues or performance bottlenecks specific to that region.

    If the issue persists, consider reaching out to Azure support for more tailored assistance regarding performance optimization with the Responses API and gpt-5.1.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.