How can we do language detection for audio url before performing speech to text of Cognitive services API?

Question

How can we do language detection for audio url before performing speech to text of Cognitive services API?

Jeb Million 21

Hello all,

I can use speech to text batch transcription api by providing single language as part of request body. I could also detect the audio language in prior for the audio files from my local directory and pass that to the request body.

But I want to detect the language from the audio url and pass that to the request body.

{
"contentUrls": [
"<URL to an audio file 1 to transcribe>"
],
"properties": {
"wordLevelTimestampsEnabled": true
},
"locale": [detected lang] ,
"model": {
"self": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.0/models/{id}"
},
"displayName": "Transcription of file using default model for en-US"
}

Is there any way to do the same?

1 answer

Your answer

Answer 1

Ramr-msft 17,826

@Jeb Million Thanks for the question. Language identification can be used to determine the language being spoken in audio that has been passed to the Speech SDK.

Please follow the document for sample using C#.
https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-automatic-language-detection?pivots=programming-language-csharp

Jeb Million 21 Reputation points

2021-07-20T05:22:04.713+00:00

Thank you.
I tried with the python code given as per documentation. It's showing file not found error when I use audio url obtained from the audio stored in my azure container.
Tim Leyden 346 Reputation points

2021-07-20T05:55:21.117+00:00

@Jeb Million as far as I know the batch api does not support auto detection for locale from the file submitted. you would have to preprocess the file using something like the realtime speech sdk as already suggested and then use that to submit to the api. try using local storage for the realtime api instead of the azure storage container

Share via

How can we do language detection for audio url before performing speech to text of Cognitive services API?

1 answer

Your answer