Performance issues with the gpt-5.1 model in the Responses API, particularly with file and image inputs, can be attributed to several factors. While specific details about the slow streaming performance with gpt-5.1 are not explicitly covered in the provided context, it is noted that different models and APIs can exhibit varying performance characteristics.
Here are some considerations that may help:
- Model Latency: The performance of different models can vary significantly. If gpt-5.1 is consistently slow with file/image inputs, it may be beneficial to evaluate if other models (like gpt-5) or APIs (like the Completions API) can meet your needs, as you mentioned they perform normally.
- Input Size and Complexity: Large files or complex images can inherently slow down processing times. If possible, try to optimize the size of the inputs or simplify the content being processed.
- Streaming: Enabling streaming can sometimes help manage user expectations by providing partial results as they are generated, but it may not resolve underlying latency issues.
- Content Filtering: If content filtering is enabled, it can impact response times. Evaluate if your workloads could benefit from modified content filtering policies.
- Region-Specific Performance: Since you are using the East US 2 region, it may be worth checking if there are any known issues or performance bottlenecks specific to that region.
If the issue persists, consider reaching out to Azure support for more tailored assistance regarding performance optimization with the Responses API and gpt-5.1.