Hi schoell,
Greetings and Welcome to Microsoft Q&A! Thanks for posting the question.
I understand that you are experiencing significant latency issues when passing images to your Azure OpenAI GPT-4o deployment in the East US region,
I attempted to reproduce the issue in my environment, and it works as expected, taking only 3 to 4 seconds, I deployed gpt-4o 2024-08-06 and gpt-4o-mini (2024-07-18) in both East US and East US 2 regions.
Here are the few potential causes for that,
- Regional Load can occur due to increased demand, maintenance, or unexpected operational constraints, causing temporary slowdowns.
- Configuration Differences between regions, such as variations in hardware, resource allocation, or deployment settings, may result in inconsistent performance.
To address the issue, consider these steps:
- Monitor Regional Service Health using tools like the Azure Service Health dashboard to identify ongoing issues or incidents in the affected region. Proactive monitoring and routing traffic to alternate regions during peak times can help mitigate latency concerns effectively.
- Use efficient formats like JPEG, keep file sizes under 100 KB, and minimize Base64 overhead. Preprocess images by resizing and compressing and consider batching or asynchronous requests to reduce latency and improve performance.
- Might this issue would be intermittent, it could be due to a temporary network or server issue. In this case, you can try again later to see if the issue has been resolved.
Kindly refer this Performance and latency.
I Hope this helps. Do let me know if you have any further queries.
Thank you!