Hello, i have lot of short audio wave files of 5 seconds or so in hand. When i transcribe them with Azure Speech-To-Text REST API and Java SDK respectively, i found REST API recognition accuracy seems always a little bit worse than that of Java SDK, though the gap is less than 1% CER (Character Error Rate). Rest API: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-speech-to-text#speech-to-text-rest-api-for-short-audio Java SDK: https://learn.microsoft.com/en-us/java/api/com.microsoft.cognitiveservices.speech.speechrecognizer.startcontinuousrecognitionasync?view=azure-java-stable#com_microsoft_cognitiveservices_speech_SpeechRecognizer_startContinuousRecognitionAsync_ _ Why there is such a gap b/w REST and SDK ? Thank you.

Azure Speech-To-Text: Accuracy difference b/w Rest API vs. SDK for short audio

Kun Wu 146 Microsoft Employee

Hello,

i have lot of short audio wave files of 5 seconds or so in hand. When i transcribe them with Azure Speech-To-Text REST API and Java SDK respectively, i found REST API recognition accuracy seems always a little bit worse than that of Java SDK, though the gap is less than 1% CER (Character Error Rate).

Why there is such a gap b/w REST and SDK ?

Thank you.

romungi-MSFT 46,831 Reputation points Microsoft Employee

2021-08-24T08:51:28.517+00:00
@Kun Wu What are the endpoints used with the REST API and the SDK? Ideally, if you are using the same region, request parameters the result should be the same.
You could also try to check if the same result is seen with the SDK if fromEndpoint() is used for speech config with the REST API endpoint i.e something similar to this endpoint based on your language and region?

https://westus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US
Kun Wu 146 Reputation points Microsoft Employee

2021-08-24T09:18:09.567+00:00
Hello @romungi-MSFT ,

Yes, my comparison is based on the same regions and the gap is consistent. The region i tried are chinaeast2 and eastasia respectively.

For REST, i'm using below url and headers.

url='https://eastasia.stt.speech.azure.cn/speech/recognition/conversation/cognitiveservices/v1?language=' + LANG

headers = { 'Accept': 'application/json;text/xml', 'Content-Type': 'audio/wav;codecs="audio/pcm";samplerate=16000', 'Ocp-Apim-Subscription-Key': <my key>, 'format': 'detailed' }

For SDK, i'm using below API

SpeechConfig.fromSubscription("<my key>", "eastasia")

FYI.
Kun Wu 146 Reputation points Microsoft Employee

2021-08-26T08:32:39.19+00:00

Hello,

Is there any findings or update please ?

Thank you.
romungi-MSFT 46,831 Reputation points Microsoft Employee

2021-08-26T09:13:04.4+00:00

Unfortunately, I do not have any updates. I think it would be easier to report this issue from the SDK issues page with the speech files of the language you are using to check if this is a bug. This will also provide the SDK team to check if there is any discrepancy in the results from bot the calls.
Kun Wu 146 Reputation points Microsoft Employee

2021-08-26T11:47:51.43+00:00

thanks for suggestion @romungi-MSFT , link to
https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/1230

Share via

Azure Speech-To-Text: Accuracy difference b/w Rest API vs. SDK for short audio

Your answer