Pronunciation assessment SDK is getting stuck

Dan Tang 0

I'm trying to integrate the pronunciation assessment speech services Python SDK - specifically a web front-end will upload an audio file to a fastapi backend, which will then utilise whisper to transcribe and then send the transcription together with the audio file to MSFT's endpoint for evaluation. However, each time I do so, it hangs and I get a (Timeout: no recognition result received) error after 10+ seconds.

I suspect that the error might be similar to Microsoft Cognitive SpeechRecognizer Stuck, but a) I'm using the Python SDK which does not have the FromWavFileInput method, b) I tried adding 100kb of empty buffer, but it still does not work.

Wondering if anyone has any suggestions? I've posted my code on https://stackoverflow.com/questions/78783121/microsoft-cognitive-speech-services-sdk-python-is-getting-stuck?noredirect=1#comment138902133_78783121 as well.

VasaviLankipalle-MSFT 17,476 Reputation points

2024-07-24T04:20:34.3666667+00:00

Hello @Dan Tang , Thanks for using Microsoft Q&A Platform.

You mentioned that the Python SDK lacks the FromWavFileInput method. Have you tried using other SDKs, such as C#, and encountered no issues with this?

Share via

Pronunciation assessment SDK is getting stuck

Your answer