SpeechTranslationConfig.setSpeechSynthesisOutputFormat() in Java SDK doesn't work
I'm trying Java Speech SDK 1.14.0 for speech translation on Android. I use SpeechTranslationConfig to create a TranslationRecognizer:
AudioConfig audio = AudioConfig.fromDefaultMicrophoneInput(); SpeechTranslationConfig config = SpeechTranslationConfig.fromSubscription( subscriptionKey, subscriptionRegion ); config.setSpeechRecognitionLanguage(sourceLanguage); config.addTargetLanguage(targetLanguage); config.setSpeechSynthesisLanguage(targetLanguage); config.setVoiceName(voiceName); config.setSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Raw16Khz16BitMonoPcm); this.recognizer = new TranslationRecognizer(config, audio); ...
Everything is fine, except that no mater what I set in setSpeechSynthesisOutputFormat(), it always gives me the audio in Riff16Khz16BitMonoPcm format, which seems to be the default value.
Is there any other way to specify the speech synthesis output format?
@Slimek Wu I think the output format in your case could be limited to the available formats with the voice used so the format is set to Riff16Khz16BitMonoPcm.
Could you try a scenario where the voice and language used is different with a different format. For example, this sample uses the following using the same method.
@romungi-MSFT Thanks for your response. But I have to mention that the sample you provided is about SpeechSynthesizer class, and my question is about TranslationRecognizer class.
Both classes have similar configuring approaches:
When I use SpeechSynthesizer, the SpeechConfig.setSpeechSynthesisOutputFormat() works as expected.
And when I use TranslationRecognizer, the SpeechTranslationConfig.setSpeechSynthesisOutputFormat() has no effect. This is the problem bothered me :)
@Slimek Wu Yes, based on the implementation of SpeechTranslationConfig class it is inheriting speech config's setSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat value) method so it should work the same. I suspected the required format might not be set in your case because of selected voice. Does the required format get set for the same voice and language for SpeechConfig?
@romungi-MSFT Hi, I have tried both combinations with the same configuration.
SpeechSynthesizer + SpeechConfig :
TranslationRecognizer + SpeechTranslationConfig :
(PS: If I use setSpeechSynthesisVoiceName() to set the voice name, there is even no voice output...)
The former gives me MP3, but the later is still in RIFF format.
Based on this result, I suspect that both setSpeechSynthesisVoiceName() and setSpeechSynthesisOutputFormat() have no effects when configuring TranslationRecognizer.
@Slimek Wu I believe that this is a bug with the SDK where the setting with translationRecognizer() is not persistent. Is it possible to report the same in our SDK issues page here? I have reviewed all open/closed issues but this behavior is not reported yet, expect for auto language detection which is a bug as mentioned in this issue.
Sign in to comment