Can Azure Text to Speech support more audio files formats, like MP3 ?

prashant nigam 21 Reputation points
2022-01-13T11:51:26.213+00:00

Does Azure Text to Speech only support WAV files?

Wav file is pretty big in size (5x in comparison to mp3) since I have limited bandwidth so it's not suitable for my project.

I already tried some of the built-in python libraries to convert .wav to .mp3 but none of them work.

On the other clouds like google, they provide the option of audio encoding so there we can get out the required format.

Azure Translator
Azure Translator
An Azure service to easily conduct machine translation with a simple REST API call.
340 questions
Azure AI Language
Azure AI Language
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.
354 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,373 questions
0 comments No comments
{count} votes

Accepted answer
  1. romungi-MSFT 41,961 Reputation points Microsoft Employee
    2022-01-13T14:11:19.483+00:00

    @prashantnigam-6347 Yes, the text to speech service supports other audio formats too. All supported audio formats are listed in their respective SDK references. For ex, the formats that can be set if you are using python SDK are here.

    You can set the format using your speech config.

    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)  
    speech_config.set_speech_synthesis_output_format(speechsdk.SpeechSynthesisOutputFormat.Audio16Khz32KBitRateMonoMp3)  
      
     file_name = "outputaudio.mp3"  
     file_config = speechsdk.audio.AudioOutputConfig(filename=file_name)  
     speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=file_config)  
    

    The samples from SDK repo should help you set them with your application.

    If you are using REST API you can set it using the header. For ex:

    curl --location --request POST 'https://INSERT_REGION_HERE.tts.speech.microsoft.com/cognitiveservices/v1' \  
    --header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \  
    --header 'Content-Type: application/ssml+xml' \  
    --header 'X-Microsoft-OutputFormat: audio-16khz-128kbitrate-mono-mp3' \  
    --header 'User-Agent: curl' \  
    --data-raw '<speak version='\''1.0'\'' xml:lang='\''en-US'\''>  
        <voice xml:lang='\''en-US'\'' xml:gender='\''Female'\'' name='\''en-US-JennyNeural'\''>  
            my voice is my passport verify me  
        </voice>  
    </speak>' > output.mp3  
    

    If an answer is helpful, please click on 130616-image.png or upvote 130671-image.png which might help other community members reading this thread.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful