Why is the OpenAI service provided by Microsoft so slow?

Question

Why is the OpenAI service provided by Microsoft so slow?

vedeng 0

I am from China and currently using the Azure OpenAI API. My selected server region is Central Sweden, but the speed is extremely slow, to an unacceptable extent. I am not sure if this is due to the improper choice of region or because of the payment level. Is there any way to improve this situation and make it faster?

vedeng 0 Reputation points

2024-01-12T00:54:36.58+00:00

If it's a matter of region, could you please tell me which region I should choose to get a faster response speed if I am in China?
Dillon Silzer 57,826 Reputation points Volunteer Moderator

2024-01-12T00:55:54.6233333+00:00

Can you please let us know what region and model you are using?
vedeng 0 Reputation points

2024-01-12T02:33:37.49+00:00

Australia East、gpt-4

api-version=2023-12-01-preview
Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.
vedeng 0 Reputation points

2024-01-12T02:40:29.7933333+00:00

I accessed the REST API using servers in Hong Kong and the United States respectively, and it wasn't very fast either. It doesn't feel like the slowness is caused by network issues.
AshokPeddakotla-MSFT 35,971 Reputation points Moderator

2024-01-18T03:11:23.5066667+00:00

vedeng Just checking if we are still connected on this discussion? Please let us know if you need further help. If this answers your query, do click Accept Answer and Yes for was this answer helpful.

1 answer

Your answer

vedeng 0 Reputation points

2024-01-12T00:54:36.58+00:00

If it's a matter of region, could you please tell me which region I should choose to get a faster response speed if I am in China?
Dillon Silzer 57,826 Reputation points Volunteer Moderator

2024-01-12T00:55:54.6233333+00:00

Can you please let us know what region and model you are using?
vedeng 0 Reputation points

2024-01-12T02:33:37.49+00:00

Australia East、gpt-4

api-version=2023-12-01-preview
Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.
vedeng 0 Reputation points

2024-01-12T02:40:29.7933333+00:00

I accessed the REST API using servers in Hong Kong and the United States respectively, and it wasn't very fast either. It doesn't feel like the slowness is caused by network issues.
AshokPeddakotla-MSFT 35,971 Reputation points Moderator

2024-01-18T03:11:23.5066667+00:00

vedeng Just checking if we are still connected on this discussion? Please let us know if you need further help. If this answers your query, do click Accept Answer and Yes for was this answer helpful.

Answer 1

vedeng Greetings!

Could you please provide prompt size, max token set which you are using?

Please check the documentation Performance and latency about improving the latency performance.

Here are some of the best practices to lower latency:

Model latency: If model latency is important to you we recommend trying out our latest models in the GPT-3.5 Turbo model series.
Lower max tokens: OpenAI has found that even in cases where the total number of tokens generated is similar the request with the higher value set for the max token parameter will have more latency.
Lower total tokens generated: The fewer tokens generated the faster the overall response will be. Remember this is like having a for loop with n tokens = n iterations. Lower the number of tokens generated and overall response time will improve accordingly.
Streaming: Enabling streaming can be useful in managing user expectations in certain situations by allowing the user to see the model response as it is being generated rather than having to wait until the last token is ready.
Content Filtering improves safety, but it also impacts latency. Evaluate if any of your workloads would benefit from modified content filtering policies.

Please let me know if you have any further queries.

Share via

Why is the OpenAI service provided by Microsoft so slow?

1 answer

Your answer