Increase of latency in GPT4 deployments
My organization has several Azure AI services which contain GPT4 deployments, located in France Central (we are in Spain)
Since late November we have experimented a large increase in latency in all of GPT4 deployments.
Before this, the time response for a 2000 tokens prompt (request + completion) was about 20 seconds at most. Now, with the same prompt (and same deployment) the average time response is about 50-60 seconds. Max tokens variable is set to 500.
We assume this has to do with the increase in users and requests (which led to the temporary cancellation of new GPT4 subscriptions).
But we would like to ask:
- Is there some way we can minimize this latency problem?
- How long this problem is expected to persist?
Thanks in advance