Can Azure Text to Speech support more audio files formats, like MP3 ?

Question

Does Azure Text to Speech only support WAV files?

Wav file is pretty big in size (5x in comparison to mp3) since I have limited bandwidth so it's not suitable for my project.

I already tried some of the built-in python libraries to convert .wav to .mp3 but none of them work.

On the other clouds like google, they provide the option of audio encoding so there we can get out the required format.

Accepted Answer

@prashantnigam-6347 Yes, the text to speech service supports other audio formats too. All supported audio formats are listed in their respective SDK references. For ex, the formats that can be set if you are using python SDK are here.

You can set the format using your speech config.

speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)  
speech_config.set_speech_synthesis_output_format(speechsdk.SpeechSynthesisOutputFormat.Audio16Khz32KBitRateMonoMp3)  
  
 file_name = "outputaudio.mp3"  
 file_config = speechsdk.audio.AudioOutputConfig(filename=file_name)  
 speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=file_config)

The samples from SDK repo should help you set them with your application.

If you are using REST API you can set it using the header. For ex:

curl --location --request POST 'https://INSERT_REGION_HERE.tts.speech.microsoft.com/cognitiveservices/v1' \  
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \  
--header 'Content-Type: application/ssml+xml' \  
--header 'X-Microsoft-OutputFormat: audio-16khz-128kbitrate-mono-mp3' \  
--header 'User-Agent: curl' \  
--data-raw '  
      
        my voice is my passport verify me  
      
' > output.mp3

If an answer is helpful, please click on or upvote which might help other community members reading this thread.

Can Azure Text to Speech support more audio files formats, like MP3 ?

0 additional answers