Slow response to GenerateAnswer

Hilo 41 Reputation points

we use QnA Maker service since some time by now. We created three different services in order to have knowledge bases of three different languages. We noticed that the generate answer API is sometimes very slow to respond. This is our configuration:

  • 1 App Service Plan (S1) - West Europe located;
  • 3 App Services, one for each language (Always on setting turned on);
  • 3 Cognitive Services QnA Maker type (2 on F0 tier, 1 on S0) - West US located (the only option for now);
  • 3 Search Services (Basic tier, 1 replica each) - West Europe located.

We tried different things in order to improve response time, but we can see in Application Insights that often it takes some seconds, sometimes is 200ms - which is cool, sometimes is 10s - which is bad (one time we reached also 30s). The average for each language for the last 30 days is: 1.34s, 9.48s, 2.31s (please note that last value was relative to an App Service that hadn't Always on setting enabled, we changed that only today). We tried for example using 3 replicas of Search Service but the result didn't improve.
We are aware that a new Preview version of QnA Maker is online, we were also thinking to switch to it in order to see if the new version could improve the response time but that requires some changes in our infrastructure that is not the best at the moment (for example QnAMaker libraries for Bot Builder, that we use, are not updated yet to support Preview version, as I could see).

Now, the question is, are those response times expected? If not, what could we change in order to improve performances?

Thank you.

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
1,874 questions
0 comments No comments
{count} votes

Accepted answer
  1. romungi-MSFT 39,696 Reputation points Microsoft Employee

    @Hilo Based on my experience the slowness of QnAMaker API mostly depends on the App service plan used for the app service. In this case it looks like you have 3 different services with one app service plan where the app services are sharing the compute space. It is usually a common practice to use the same plan for different app services but if you are seeing persistent slowness you can try to isolate one of the app service to a different plan and check if it improves the performance. You can also use the scale up option to upgrade your plan to a P1V2 or P1V3 and scale down if not required.

0 additional answers

Sort by: Most helpful