API Limits and Performance of Lanugage Service

Prakhar Birla 1 Reputation point
2022-12-29T12:30:21.667+00:00

I'm trying to use the Converational Language Understanding (CLU) of the Cognitive Language Service in an high-performance use case where I'm trying to make say 50 calls per second (TPS) to this service.

The documentation says the limit is 1000 TPM for the Prediction Service. Which would be ~16 TPS, much lower that what I need to achieve. Earlier LUIS would allow the deployment of multiple prediction resources paired with one Authoring resource for scaling further. Also, LUIS allowed containerized prediction deployment and would get ~40 TPS with a 1-core 4GB RAM machine.

Now I'm getting ~200 ms avg with ~16 TPS with the CLU service and the consumer in the same region. How can I scale this setup?

Azure Language Understanding (LUIS)
Azure Language Understanding (LUIS)
A feature of Azure Cognitive Service for Language that uses natural language understanding to enable people to interact with apps, bots, and internet of things devices.
96 questions
Azure QnA Maker
Azure QnA Maker
An Azure Cognitive Service for Language feature that distills information into conversational answers.
139 questions
Cognitive Service for Language
Cognitive Service for Language
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.
173 questions
Azure Cognitive Services
Azure Cognitive Services
A group of Azure artificial intelligence services and cognitive APIs that help build intelligent apps.
946 questions
{count} votes