Hello @santoshkc ,
Thank you for your answer.
We are using Azure Speech To Text on our application to get transcripts from user voice, so there is no way we have reference text from our side.
We are using Azure STT to get the transcript directly from user audio.
We are currently using the SpeechRecognition Python SDK with pronunciation assessment enabled. This configuration allows us to obtain both the transcript and the pronunciation assessment simultaneously. However, we have noticed that disabling pronunciation assessment (using only speech-to-text) results in higher quality transcripts.
Is there a way to first get the transcript from Azure STT and then perform the pronunciation assessment on that transcript afterward?