Hi @Sakib Ali Choudhary ,
Thank you for contacting Microsoft Q&A.
I understand that you wanted to achieve real-time speech-to-text transcription in multiple languages. I will be happy to assist you with this.
To implement dynamic language detection and real-time speech-to-text conversion using Azure Speech, you can use Azure Cognitive Services Speech SDK with the continuous translation feature and AutoDetectSourceLanguageConfig
.
For more information and detailed documentation, you can refer to the below Azure Speech SDK documentation.
Speech translation quickstart - Speech service - Azure AI services | Microsoft Learn
Language identification - Speech service - Azure AI services | Microsoft Learn
Along with above docs, here is the sample code I used to reproduce the real-time dynamic language detection and speech-to-text conversion. The code listens for speech input, detects the language being spoken, and provides translations in multiple languages. Modify the code as per your requirements.
import azure.cognitiveservices.speech as speechsdk
speech_key, service_region = "YOUR_SPEECH_KEY","YOUR_SPEECH_REGION"
def continuous_translation_from_microphone():
translation_config = speechsdk.translation.SpeechTranslationConfig(
subscription=speech_key,
region=service_region,
speech_recognition_language='en-US')
audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True) # Use the default microphone
# Add target languages to the translation configuration
translation_config.add_target_language("de")
translation_config.add_target_language("fr")
translation_config.add_target_language("hi")
recognizer = speechsdk.translation.TranslationRecognizer(
translation_config=translation_config,
audio_config=audio_config)
print("Speak something... ")
try:
while True:
result = recognizer.recognize_once()
# Check the result
if result.reason == speechsdk.ResultReason.TranslatedSpeech:
print(f"Recognized: {result.text}")
print(f"German translation: {result.translations.get('de', '')}")
print(f"French translation: {result.translations.get('fr', '')}")
print(f"Hindi translation: {result.translations.get('hi', '')}")
elif result.reason == speechsdk.ResultReason.RecognizedSpeech:
print("Recognized: {}".format(result.text))
detectedSrcLang = result.properties[speechsdk.PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult]
print("Detected Language: {}".format(detectedSrcLang))
elif result.reason == speechsdk.ResultReason.NoMatch:
print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
print("Translation canceled: {}".format(result.cancellation_details.reason))
if result.cancellation_details.reason == speechsdk.CancellationReason.Error:
print("Error details: {}".format(result.cancellation_details.error_details))
except KeyboardInterrupt:
print("Recognition stopped.")
def main():
continuous_translation_from_microphone()
if __name__ == "__main__":
main()
Output
Speak something... (Press Ctrl+C to stop)
Recognized: How are you?
German translation: Wie geht es dir?
French translation: Comment vas-tu?
Hindi translation: तुम कैसे हो?
Recognized: I am fine.
German translation: Es geht mir gut.
French translation: Je vais bien.
Hindi translation: मैं बढ़िया हूँ।
Recognition stopped.
Thank You!
Hope this helps.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful.