Generate audio file from speech

Question

Generate audio file from speech

Albert Duran 6

Hi,

I am currently working on an app which uses both: Speech To Text and Text To Speech. When we ask for something to the user, we are using Speech To Text to get the answer as a text but we need also to get an audio file (.mp3 or .wav) of that answer. Do you know how can I reach this goal?

Thank you!

Albert

2 answers

Your answer

Answer 1

Oxueillirep 131

Hi,
you can find it from the official documentation: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-speech-synthesis?tabs=browserjs%2Cterminal&pivots=programming-language-python#synthesize-speech-to-a-file , also for other programming languages.

regards

If an answer is helpful, please click on or upvote which might help other community members reading this thread.

Answer 2

Albert Duran 6

Thanks for your reply, I've checked the link and tried to implement it but it does not seem to work for me.

What I need is to get the recorded voice in an audio file (like .wav for example) but, if I am not wrong, what SpeechSynthesizer is asking for is just a string (SpeakTextAsync/SpeakSsmlAsync). Let me know if I am wrong and If i'm not, I will truly appreciate any help with this.

Henry Gao 0 Reputation points

2023-01-21T19:31:24.46+00:00

Hi!

I am also currently trying to build a app using STT and TTS, and I hope this is not a late answer! SpeechSynthesizer does just ask for a String, but it also has information about which voice to use and where to store the file (if you tells it).

There is a sample code in the official document:

audio_config = speechsdk.audio.AudioOutputConfig(filename="path/to/write/file.wav")

In this case, the synthesizer will store the file in the designated path.

To edit the output format, here is a sample:

speech_config.set_speech_synthesis_output_format(speechsdk.SpeechSynthesisOutputFormat.Riff24Khz16BitMonoPcm)

Hope this helps!

Share via

Generate audio file from speech

2 answers

Your answer