@Wayne Lee There has been no recent updates to the cli or the endpoint versions to cause such inconsistencies. There could be difference in output in scenarios where the mic is used because of various factors like audio quality, background noise etc. With the audio file the result would be consistent. Is there any major difference in the output in your case? There could be minor updates to the endpoints to fix some bugs that could effect the quality of the model but should not really cause a major difference in the recognition output. Is it possible to provide your audio and responses in both the cases along with your resource details like the region of your speech resource?
Azure ASR returns different recognition results at different times with the same audio file
I am learning to use Azure Speech Service. I used Speech CLI and sent exact same audio file to Azure, but I received different results. I have been testing the CLI parameters, so I tried to sent audio on 4/13 and today, using the same command below:
spx recognize --file MyAudioFileName.wav --output batch file MyAudioFileName.json
But I found the returned results i got today were different compared to the results i requested on 4/13.
Was it because the Azure ASR models/engine have been updated, or it was because some other thing change? On my side, i used the same account, same command, and same audio file.
I want to get results consistently.
Please confirm the reason for the difference in the results.
Sign in to comment
@romungi-MSFT Thanks for your information. I the difference is big. The results i obtained on 4/12 missed one session compared with the results i obtained on 415 and later. I tested on 4/15 (2 times) and today, and these results are consistent. I tried to upload the wav and the json files, but was told here i cannot upload these files, except txt/pdf/log/ files. I added extensions ".txt" to the wav and json files, but still failed to upload them. Anyway, as the results are consistent since 4/15, i think it is ok. If i find any other cases of such inconsistency, i will report to you. BTW, the speech data is in noisy environment, like restaurants. Thanks again for your help.
Sign in to comment