Slow response to GenerateAnswer

Question

Slow response to GenerateAnswer

Hilo 41

Hello,
we use QnA Maker service since some time by now. We created three different services in order to have knowledge bases of three different languages. We noticed that the generate answer API is sometimes very slow to respond. This is our configuration:

1 App Service Plan (S1) - West Europe located;
3 App Services, one for each language (Always on setting turned on);
3 Cognitive Services QnA Maker type (2 on F0 tier, 1 on S0) - West US located (the only option for now);
3 Search Services (Basic tier, 1 replica each) - West Europe located.

We tried different things in order to improve response time, but we can see in Application Insights that often it takes some seconds, sometimes is 200ms - which is cool, sometimes is 10s - which is bad (one time we reached also 30s). The average for each language for the last 30 days is: 1.34s, 9.48s, 2.31s (please note that last value was relative to an App Service that hadn't Always on setting enabled, we changed that only today). We tried for example using 3 replicas of Search Service but the result didn't improve.
We are aware that a new Preview version of QnA Maker is online, we were also thinking to switch to it in order to see if the new version could improve the response time but that requires some changes in our infrastructure that is not the best at the moment (for example QnAMaker libraries for Bot Builder, that we use, are not updated yet to support Preview version, as I could see).

Now, the question is, are those response times expected? If not, what could we change in order to improve performances?

Thank you.

0 comments

Answer accepted by question author

0 additional answers

Your answer

Answer 1

Rohit Mungi 49,131 Microsoft Employee Moderator

@Hilo Based on my experience the slowness of QnAMaker API mostly depends on the App service plan used for the app service. In this case it looks like you have 3 different services with one app service plan where the app services are sharing the compute space. It is usually a common practice to use the same plan for different app services but if you are seeing persistent slowness you can try to isolate one of the app service to a different plan and check if it improves the performance. You can also use the scale up option to upgrade your plan to a P1V2 or P1V3 and scale down if not required.

Hilo 41 Reputation points

2021-02-17T09:03:09.483+00:00

@Rohit Mungi Thank you for your reply. Yes as you stated we are using the same plan for the three app services. So we checked the metrics of the plan and noticed that the CPU and memory were quite high. As you suggested we tried to update the plan to a P1V2, we'll now wait some time in order to see if the performances are now better.

By the way, we was checking also Cognitive Service metrics and found out high latency. We were trying to isolate GenerateAnswer operation in order to see how latency was distributed in that specific case, apart from other operations such us update knowledge base etc but we found out that that option is not listed in the filter list:

Is this an expected behaviour?
Thank you again.
Rohit Mungi 49,131 Reputation points Microsoft Employee Moderator

2021-02-22T09:40:19.597+00:00

@Hilo The operations mentioned in the list are the current supported metrics for latency in this namespace.

Share via

Slow response to GenerateAnswer

0 additional answers

Your answer