Hi Emil Lienemann,
It sounds like you're encountering significant latency issues when using structured outputs with function calling in the Azure OpenAI service.
- Reduce JSON Complexity by minimizing the JSON schema size or splitting large responses into multiple function calls to reduce processing overhead and improve response speed.
- Optimize Token Limits by setting a lower max_tokens value to encourage faster responses and prevent excessive token generation delays.
- Use Streaming to start receiving responses earlier instead of waiting for full completion, improving perceived latency and responsiveness.
- Experiment with Smaller Schemas by testing a simpler function-calling schema to reduce the model’s processing time and improve output generation speed.
- Optimize Azure Region & Model Selection by testing different Azure regions or model versions, as some may have lower latencies due to resource availability and service load.
For more information: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/structured-outputs?tabs=python-secure%2Cdotnet-entra-id&pivots=programming-language-csharp
I hope this information helps. Thank you!