Azure speech to text - Questions

Question

Hi,

I am using Azure speech to text to transcript some Audio files into text. The method which I was using is python method speech_recognizer.start_continuous_recognition()

Continius recognition section under https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/get-started-speech-to-text?tabs=windowsinstall&pivots=programming-language-python

It is working very well, but now to improve my use case i need to add a few things. Can you please help me with the below set of questions please.

Language detection:

From what I tried using the code samples, language detection of an Audio file can only by done by providing a set of 4 possible languages. Is that correct?

Is there any way the APIs can work on detecting the language without me having to provide set of possible languages?

What would happen if an Audio file has more than 4 different languages that would not be defined in the set of possible languages.

Explicit language, profanity check and bad language warning.

Does Azure speech to text APIs provide ways to detect bad language and possibly mask those with stars(****) or similar? Does the APIs provide any flags/options to mask such words? Can you provide some code samples if available.

Metadata:

When processing a file, is there a way Azure speech to text APIs could provide metadata on the files? For example if an Audio file is uploaded can it provide a set of flags, keyword summary of how many times a certain word has been used etc? Can you provide some code samples if available.

Answer

@Bharath Chandra Thanks for the question. Speech SDK offers automatic language detection but it currently has a services-side limit of two languages per detection. Language currently supported - https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support

NLP-recipes repo can help with scenarios: https://github.com/microsoft/nlp-recipes and Cognitive services content moderator also includes a subset of them.

Azure speech to text - Questions

1 answer