gpt-5.1 Responses API - Very slow streaming with file/image inputs

Question

gpt-5.1 Responses API - Very slow streaming with file/image inputs

Manuel Stapff 0

Hi everyone,

I'm experiencing significant performance issues with gpt-5.1 in the Responses API when processing inputs that contain files or images. The streaming response speed becomes extremely slow, to the point of being completely unusable for any user-facing application.

Issue Details:

Model: gpt-5.1
API: Responses API
Region: East US 2
Reproduction: 100% consistent

Behavior:

Extremely slow streaming when gpt-5.1 processes any file input (PDF) or image input
Occurs with all three input methods: base64, URL, and data file
Occurs regardless of reasoning effort setting (low, medium, high)

What works normally:

gpt-5 with same file/image inputs → normal speed
gpt-5.1-chat with file/image inputs → normal speed
gpt-5.1 in Completions API → normal speed

Since we rely on the Responses API for hosted tools (File Search, Code Interpreter), we can't easily switch to Completions API.

Has anyone else experienced this? Any known workarounds?

Thanks!

Manas Mohanty 13,255 Moderator

Hi Manuel Stapff

Does the issue still persist for you.

I just tested GPT 5.1 deployment in East US 2 region, I am able to get responses

import os
from openai import AzureOpenAI

endpoint = "https://testresponseapi.openai.azure.com/"
model_name = "gpt-5.1"
deployment = "gpt-5.1"
import base64

subscription_key = "<apikey>"
api_version = "2024-12-01-preview"

with open("./2058_B_6.pdf", "rb") as f: # assumes PDF is in the same directory as the executing script
    data = f.read()

base64_string = base64.b64encode(data).decode("utf-8")

response = client.responses.create(
    model="gpt-4o", # model deployment name
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_file",
                    "filename": "PDF-FILE-NAME.pdf",
                    "file_data": f"data:application/pdf;base64,{base64_string}",
                },
                {
                    "type": "input_text",
                    "text": "Summarize this PDF",
                },
            ],
        },
    ]
)

print(response.output_text)

Could you share your file size, Api version in private message if the issue persists for you.

Thank you

1 answer

Your answer

Manas Mohanty 13,255 Reputation points Moderator

2025-12-01T09:24:57.18+00:00

Hi Manuel Stapff

Does the issue still persist for you.

I just tested GPT 5.1 deployment in East US 2 region, I am able to get responses

import os from openai import AzureOpenAI endpoint = "https://testresponseapi.openai.azure.com/" model_name = "gpt-5.1" deployment = "gpt-5.1" import base64 subscription_key = "<apikey>" api_version = "2024-12-01-preview" with open("./2058_B_6.pdf", "rb") as f: # assumes PDF is in the same directory as the executing script data = f.read() base64_string = base64.b64encode(data).decode("utf-8") response = client.responses.create( model="gpt-4o", # model deployment name input=[ { "role": "user", "content": [ { "type": "input_file", "filename": "PDF-FILE-NAME.pdf", "file_data": f"data:application/pdf;base64,{base64_string}", }, { "type": "input_text", "text": "Summarize this PDF", }, ], }, ] ) print(response.output_text)

Could you share your file size, Api version in private message if the issue persists for you.

Thank you

Answer 1

Performance issues with the gpt-5.1 model in the Responses API, particularly with file and image inputs, can be attributed to several factors. While specific details about the slow streaming performance with gpt-5.1 are not explicitly covered in the provided context, it is noted that different models and APIs can exhibit varying performance characteristics.

Here are some considerations that may help:

Model Latency: The performance of different models can vary significantly. If gpt-5.1 is consistently slow with file/image inputs, it may be beneficial to evaluate if other models (like gpt-5) or APIs (like the Completions API) can meet your needs, as you mentioned they perform normally.
Input Size and Complexity: Large files or complex images can inherently slow down processing times. If possible, try to optimize the size of the inputs or simplify the content being processed.
Streaming: Enabling streaming can sometimes help manage user expectations by providing partial results as they are generated, but it may not resolve underlying latency issues.
Content Filtering: If content filtering is enabled, it can impact response times. Evaluate if your workloads could benefit from modified content filtering policies.
Region-Specific Performance: Since you are using the East US 2 region, it may be worth checking if there are any known issues or performance bottlenecks specific to that region.

If the issue persists, consider reaching out to Azure support for more tailored assistance regarding performance optimization with the Responses API and gpt-5.1.

Share via

gpt-5.1 Responses API - Very slow streaming with file/image inputs

1 answer

Your answer