Increase of latency in GPT4 deployments

Rubén Fernández Isla 0 Reputation points
2023-12-21T11:10:12.4833333+00:00

My organization has several Azure AI services which contain GPT4 deployments, located in France Central (we are in Spain)

Since late November we have experimented a large increase in latency in all of GPT4 deployments.

Before this, the time response for a 2000 tokens prompt (request + completion) was about 20 seconds at most. Now, with the same prompt (and same deployment) the average time response is about 50-60 seconds. Max tokens variable is set to 500.

We assume this has to do with the increase in users and requests (which led to the temporary cancellation of new GPT4 subscriptions).

But we would like to ask:

  • Is there some way we can minimize this latency problem?
  • How long this problem is expected to persist?

Thanks in advance

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,989 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,449 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.