Improving streaming performance of OpenAI GPT endpoints in Azure

Question

Improving streaming performance of OpenAI GPT endpoints in Azure

Mattias Stahre 0

Hello,

I am currently testing OpenAI through Azure, but I am experiencing issues with the streaming performance. Instead of getting each word streamed one by one, I get chunks of words at a time, making the streaming very batchy. Each chunk only contains one word as expected, but the arrival of the packages from the endpoint behaves in this way.

To clarify, I have confirmed the same issue while running a curl command and streaming it. Are there any suggestions on how to optimize streaming performance for OpenAI GPT endpoints in Azure? Thank you.

Saurabh Sharma 23,846 Reputation points Microsoft Employee Moderator

2023-12-27T18:26:40.2+00:00

Hi @Mattias Stahre ,

We haven't heard back from you. Just wanted to check if you are you still facing the issue? In case If you already found a solution, would you please share it here with the community? Otherwise, let us know and we will continue to engage with you on the issue.

Thanks

Saurabh

1 answer

Your answer

Saurabh Sharma 23,846 Reputation points Microsoft Employee Moderator

2023-12-27T18:26:40.2+00:00

Hi @Mattias Stahre ,

We haven't heard back from you. Just wanted to check if you are you still facing the issue? In case If you already found a solution, would you please share it here with the community? Otherwise, let us know and we will continue to engage with you on the issue.

Thanks

Saurabh

Answer 1

Hi @Mattias Stahre

Welcome to Microsoft Q&A! Thanks for posting the question.

Can you please try to adjust the max_tokens parameter when making requests to the OpenAI GPT endpoint? The max_tokens parameter controls the maximum number of tokens that can be generated by the GPT model in a single request. By reducing the max_tokens parameter, you can increase the frequency of requests and potentially improve the streaming performance.

Please check Azure OpenAI Service performance & latency article as it provides some background on how latency works with Azure OpenAI and how to optimize your environment to improve performance.

Please let me know in case you still see any issues.

Thanks

Saurabh

Share via

Improving streaming performance of OpenAI GPT endpoints in Azure

1 answer

Your answer