we use QnA Maker service since some time by now. We created three different services in order to have knowledge bases of three different languages. We noticed that the generate answer API is sometimes very slow to respond. This is our configuration:
- 1 App Service Plan (S1) - West Europe located;
- 3 App Services, one for each language (Always on setting turned on);
- 3 Cognitive Services QnA Maker type (2 on F0 tier, 1 on S0) - West US located (the only option for now);
- 3 Search Services (Basic tier, 1 replica each) - West Europe located.
We tried different things in order to improve response time, but we can see in Application Insights that often it takes some seconds, sometimes is 200ms - which is cool, sometimes is 10s - which is bad (one time we reached also 30s). The average for each language for the last 30 days is: 1.34s, 9.48s, 2.31s (please note that last value was relative to an App Service that hadn't Always on setting enabled, we changed that only today). We tried for example using 3 replicas of Search Service but the result didn't improve.
We are aware that a new Preview version of QnA Maker is online, we were also thinking to switch to it in order to see if the new version could improve the response time but that requires some changes in our infrastructure that is not the best at the moment (for example QnAMaker libraries for Bot Builder, that we use, are not updated yet to support Preview version, as I could see).
Now, the question is, are those response times expected? If not, what could we change in order to improve performances?