API Limits and Performance of Lanugage Service

Prakhar Birla 1 Reputation point
2022-12-29T12:30:21.667+00:00

I'm trying to use the Converational Language Understanding (CLU) of the Cognitive Language Service in an high-performance use case where I'm trying to make say 50 calls per second (TPS) to this service.

The documentation says the limit is 1000 TPM for the Prediction Service. Which would be ~16 TPS, much lower that what I need to achieve. Earlier LUIS would allow the deployment of multiple prediction resources paired with one Authoring resource for scaling further. Also, LUIS allowed containerized prediction deployment and would get ~40 TPS with a 1-core 4GB RAM machine.

Now I'm getting ~200 ms avg with ~16 TPS with the CLU service and the consumer in the same region. How can I scale this setup?

Azure AI Language
Azure AI Language
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.
353 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,367 questions
{count} votes