Speech to Speech Translation

Question

Speech to Speech Translation

Chris Hooper 1 Microsoft Employee

Is it possible to use Cognitive Services to perform speech to speech translation of a video file or streamed video? If so, where do I start as most of the documentation leads to speech to text translation.

Thanks,
Chris

2 answers

Your answer

Answer 1

Stu Kennedy 6

what is the API for doing speech-to-speech ...

The only way I can figure out doing it is sdk.TranslationRecognizer to go from audio to text.
And then sdk.SpeechSynthesizer to synthesize the translated text.
Is there a way to do it in one step?

Also the translationRecognizer seems to give up early (at 30 seconds) or when encountering a pause in the audio.
How do I get it to process the whole file and keep going past pauses?

Chen Miracle 21 Reputation points

2022-10-26T02:09:38.433+00:00

have the same case, if using speech to text then text to speech, the speech to text continuous translation may cause multiple results to call speech synthesize, may cause confusion, if there is a direct speech to speech API, that's better.

Answer 2

YutongTie-MSFT 53,976 Moderator

Hello,

Thanks for reaching out to us. There is one service call Speech Translation under Azure Speech Service https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-translation

Above document describes the benefits and capabilities of the speech translation service, which enables real-time, multi-language speech-to-speech and speech-to-text translation of audio streams. With the Speech SDK, your applications, tools, and devices have access to source transcriptions and translation outputs for provided audio. Interim transcription and translation results are returned as speech is detected, and final results can be converted into synthesized speech.

Hope this helps.

Regards,
Yutong

Chris Hooper 1 Reputation point Microsoft Employee

2021-04-12T22:42:49.817+00:00
Thanks for the direction. Got started and was able to execute speech recognition. Many thanks...

Attempted to execute code to translate but receive the following error:

[Running] python -u "c:\Users\chris\OneDrive\Documents\IntroTo Python Development\Translate2.py"
Traceback (most recent call last):
File "c:\Users\chris\OneDrive\Documents\IntroTo Python Development\Translate2.py", line 4, in <module>
speech_key, service_region = os.environ['56f56d2e6c1d4777bc2c9ede17ee308d'], os.environ['eastus']
File "C:\Users\chris\AppData\Local\Programs\Python\Python39\lib\os.py", line 679, in getitem
raise KeyError(key) from None
KeyError: '56f56d2e6c1d4777bc2c9ede17ee308d'

Execute Code:

import os
import azure.cognitiveservices.speech as speechsdk

speech_key, service_region = os.environ['56f56d2e6c1d4777bc2c9ede17ee308d'], os.environ['eastus']
from_language, to_language = 'en-US', 'de'

def translate_speech_to_text():
translation_config = speechsdk.translation.SpeechTranslationConfig(
subscription=speech_key, region=service_region)

translation_config.speech_recognition_language = from_language translation_config.add_target_language(to_language) # See: https://aka.ms/speech/sdkregion#standard-and-neural-voices translation_config.voice_name = 'de-DE-Hedda' recognizer = speechsdk.translation.TranslationRecognizer( translation_config=translation_config) def synthesis_callback(evt): size = len(evt.result.audio) print(f'Audio synthesized: {size} byte(s) {"(COMPLETED)" if size == 0 else ""}') if size > 0: file = open('translation.wav', 'wb+') file.write(evt.result.audio) file.close() recognizer.synthesizing.connect(synthesis_callback) print(f'Say something in "{from_language}" and we\'ll translate into "{to_language}".') result = recognizer.recognize_once() print(get_result_text(reason=result.reason, result=result))

def get_result_text(reason, result):
reason_format = {
speechsdk.ResultReason.TranslatedSpeech:
f'Recognized "{from_language}": {result.text}\n' +
f'Translated into "{to_language}"": {result.translations[to_language]}',
speechsdk.ResultReason.RecognizedSpeech: f'Recognized: "{result.text}"',
speechsdk.ResultReason.NoMatch: f'No speech could be recognized: {result.no_match_details}',
speechsdk.ResultReason.Canceled: f'Speech Recognition canceled: {result.cancellation_details}'
}
return reason_format.get(reason, 'Unable to recognize speech')

translate_speech_to_text()
Hritik Kawale 0 Reputation points

2025-03-01T09:30:03.39+00:00

I am using Azure Real Time Speech-To-Speech Translation which is working absolutely fine, but there is latency issue which increases exponentially , i want to know is there any possible solution to reduce the latency?

Share via

Speech to Speech Translation

2 answers

Your answer