SpeechTranslationConfig.setSpeechSynthesisOutputFormat() in Java SDK doesn't work

Slimek Wu 1

I'm trying Java Speech SDK 1.14.0 for speech translation on Android. I use SpeechTranslationConfig to create a TranslationRecognizer:

AudioConfig audio = AudioConfig.fromDefaultMicrophoneInput();

SpeechTranslationConfig config = SpeechTranslationConfig.fromSubscription(
        subscriptionKey,
        subscriptionRegion
);

config.setSpeechRecognitionLanguage(sourceLanguage);
config.addTargetLanguage(targetLanguage);
config.setSpeechSynthesisLanguage(targetLanguage);
config.setVoiceName(voiceName);
config.setSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Raw16Khz16BitMonoPcm);

this.recognizer = new TranslationRecognizer(config, audio);

...

Everything is fine, except that no mater what I set in setSpeechSynthesisOutputFormat(), it always gives me the audio in Riff16Khz16BitMonoPcm format, which seems to be the default value.

Is there any other way to specify the speech synthesis output format?

romungi-MSFT 45,961 Reputation points Microsoft Employee

2020-12-07T06:51:39.787+00:00

@Slimek Wu I think the output format in your case could be limited to the available formats with the voice used so the format is set to Riff16Khz16BitMonoPcm.
Could you try a scenario where the voice and language used is different with a different format. For example, this sample uses the following using the same method.

VoiceName: BenjaminRUS
Format: Audio16Khz32KBitRateMonoMp3
Slimek Wu 1 Reputation point

2020-12-08T07:16:32.78+00:00
@romungi-MSFT Thanks for your response. But I have to mention that the sample you provided is about SpeechSynthesizer class, and my question is about TranslationRecognizer class.

Both classes have similar configuring approaches:

SpeechSynthesizer < SpeechConfig

TranslationRecognizer < SpeechTranslationConfig

When I use SpeechSynthesizer, the SpeechConfig.setSpeechSynthesisOutputFormat() works as expected.

And when I use TranslationRecognizer, the SpeechTranslationConfig.setSpeechSynthesisOutputFormat() has no effect. This is the problem bothered me :)
romungi-MSFT 45,961 Reputation points Microsoft Employee

2020-12-08T08:11:17.377+00:00

@Slimek Wu Yes, based on the implementation of SpeechTranslationConfig class it is inheriting speech config's setSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat value) method so it should work the same. I suspected the required format might not be set in your case because of selected voice. Does the required format get set for the same voice and language for SpeechConfig?
Slimek Wu 1 Reputation point

2020-12-09T16:41:21.997+00:00
@romungi-MSFT Hi, I have tried both combinations with the same configuration.

SpeechSynthesizer + SpeechConfig :

config.setSpeechSynthesisLanguage("en-US"); config.setSpeechSynthesisVoiceName("en-US-BenjaminRUS"); config.setSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Audio16Khz32KBitRateMonoMp3);

TranslationRecognizer + SpeechTranslationConfig :

config.setSpeechSynthesisLanguage("en-US"); config.setVoiceName("en-US-BenjaminRUS"); config.setSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Audio16Khz32KBitRateMonoMp3);

(PS: If I use setSpeechSynthesisVoiceName() to set the voice name, there is even no voice output...)

The former gives me MP3, but the later is still in RIFF format.

Based on this result, I suspect that both setSpeechSynthesisVoiceName() and setSpeechSynthesisOutputFormat() have no effects when configuring TranslationRecognizer.
romungi-MSFT 45,961 Reputation points Microsoft Employee

2020-12-10T12:18:32.41+00:00

@Slimek Wu I believe that this is a bug with the SDK where the setting with translationRecognizer() is not persistent. Is it possible to report the same in our SDK issues page here? I have reviewed all open/closed issues but this behavior is not reported yet, expect for auto language detection which is a bug as mentioned in this issue.

Share via

SpeechTranslationConfig.setSpeechSynthesisOutputFormat() in Java SDK doesn't work

Your answer