Share via

Azure speech Translation not doing translation properly

Miss Nansy 0 Reputation points
2025-10-07T19:29:31.4566667+00:00

Azure Speech Translation is providing only partial translations, and sometimes the microphone does not pick up the voice. It has also started producing more transcription than translation when converting from English to the target language under Speech Translation.

Azure AI Speech
Azure AI Speech

An Azure service that integrates speech processing into apps and services.


2 answers

Sort by: Most helpful
  1. Divyesh Govaerdhanan 10,860 Reputation points Volunteer Moderator
    2025-10-07T23:13:01.6733333+00:00

    Hello Miss Nansy,

    Welcome to Microsoft Q&A,

    Speech Translation returns both the source transcript and the translations. In your handler, read result.Translations[<lang>] (or e.result.translations['fr'] in JS), not just result.Text.

    For a longer speech, Use StartContinuousRecognitionAsync() (or startContinuousRecognitionAsync) instead of a single RecognizeOnce* call. Single-shot ends after one utterance (~silence or ~30 s).

    https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-recognize-speech?pivots=programming-language-csharp

    Increase the segmentation silence timeout so short pauses don’t end the phrase too early (e.g., 1200–2000 ms): speechConfig.SetProperty(PropertyId.Speech_SegmentationSilenceTimeoutMs, "1500"); Valid range: 100–5000 ms.

    https://learn.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.translation.translationrecognizer?view=azure-python

    Please upvote and accept the answer if it helps!!

    1 person found this answer helpful.

  2. Aryan Parashar 3,690 Reputation points Microsoft External Staff Moderator
    2025-10-08T10:10:07.0433333+00:00

    Hi Miss,

    I've tested the speech translation service deployed in eastus2 on my end; it is working fine. You can use the below code to test it at your end:

    import azure.cognitiveservices.speech as speechsdk
    import threading
    import time
    
    speech_key = "<YOUR-SPEECH-KEY>"
    service_region = "<DEPLOYMENT-REGION>"
    
    translation_config = speechsdk.translation.SpeechTranslationConfig(
        subscription=speech_key,
        region=service_region
    )
    translation_config.speech_recognition_language = "en-US"
    translation_config.add_target_language("es")  # Spanish
    
    audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
    
    translator = speechsdk.translation.TranslationRecognizer(
        translation_config=translation_config,
        audio_config=audio_config
    )
    
    last_speech_time = time.time()
    silence_timeout = 5  # seconds
    stop_flag = False
    
    output_log = []
    
    def recognizing_handler(evt):
        global last_speech_time
        if evt.result.text.strip():
            last_speech_time = time.time()
            print(f"Recognizing: {evt.result.text}", end="\r")
    
    def recognized_handler(evt):
        global last_speech_time
        text = evt.result.text.strip()
        if text:
            last_speech_time = time.time()
            print(f"\nRecognized: {text}")
            entry = f"Input: {text}\n"
            for lang, translation in evt.result.translations.items():
                translated_text = f"Output ({lang}): {translation}"
                print(translated_text)
                entry += translated_text + "\n"
            output_log.append(entry)
    
    def silence_watcher():
        global stop_flag
        while not stop_flag:
            time.sleep(1)
            if time.time() - last_speech_time > silence_timeout:
                print(f"\nNo speech detected for {silence_timeout}s — stopping translation.")
                translator.stop_continuous_recognition()
                stop_flag = True
                break
    
    translator.recognizing.connect(recognizing_handler)
    translator.recognized.connect(recognized_handler)
    
    print("Speak into your microphone...")
    print("(It will keep listening and stop 5 seconds after your last word.)")
    
    translator.start_continuous_recognition()
    
    watcher_thread = threading.Thread(target=silence_watcher)
    watcher_thread.daemon = True
    watcher_thread.start()
    
    while not stop_flag:
        time.sleep(0.1)
    
    print("Translation session ended.")
    
    with open("translation_output.txt", "w", encoding="utf-8") as f:
        for entry in output_log:
            f.write(entry)
            f.write("\n" + "-"*40 + "\n")
    
    print("Output saved to translation_output.txt")
    
    

    The above code will keep listening continuously and will stop after 5 seconds of silence.

    Here is the supported documentation:
    https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-translate-speech?tabs=terminal&pivots=programming-language-python

    You can also check Audio Seconds translated on the Azure Portal, as shown below:

    User's image

    Feel free to accept this as answer.

    Thankyou for reaching out to the Microsoft QNA Portal.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.