How can i obtain the audio file in a text to speech resource Azure Speech Services?

Cristian Camilo Bonilla Tellez 25 Reputation points
2023-05-18T01:13:18.9633333+00:00

Good Evening,

i would like to know if exist a way to obtain a audio file like .wav or mp3, from text to speech service using code on python or c#, when i consume the api of text to speech, the text sound in my pc with the voice selected in the request to the API , but i need the file,

Thank you for your help.

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,069 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,628 questions
0 comments No comments
{count} vote

Accepted answer
  1. romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator
    2023-05-18T10:07:43.27+00:00

    @Cristian Camilo Bonilla Tellez Yes, you can synthesize the text to an audio file using the AudioConfig input with your speechSynthesizer class. Here is a sample to obtain the output as a .wav file.

        """performs speech synthesis to a wave file"""
        # Creates an instance of a speech config with specified subscription key and service region.
        speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
        # Creates a speech synthesizer using file as audio output.
        # Replace with your own audio file name.
        file_name = "outputaudio.wav"
        file_config = speechsdk.audio.AudioOutputConfig(filename=file_name)
        speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=file_config)
    
        # Receives a text from console input and synthesizes it to wave file.
        while True:
            print("Enter some text that you want to synthesize, Ctrl-Z to exit")
            try:
                text = input()
            except EOFError:
                break
            result = speech_synthesizer.speak_text_async(text).get()
            # Check result
            if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                print("Speech synthesized for text [{}], and the audio was saved to [{}]".format(text, file_name))
            elif result.reason == speechsdk.ResultReason.Canceled:
                cancellation_details = result.cancellation_details
                print("Speech synthesis canceled: {}".format(cancellation_details.reason))
                if cancellation_details.reason == speechsdk.CancellationReason.Error:
                    print("Error details: {}".format(cancellation_details.error_details))
    
    

    You can check the complete sample of the snippet from the speech SDK github repo.along with other possible scenarios.

    To set a particular voice you need to set the same on speech config based on the available voices in your region. After setting the same you can use the same audio config to synthesize to file.

        voice = "Microsoft Server Speech Text to Speech Voice (en-US, JennyNeural)"
        speech_config.speech_synthesis_voice_name = voice
    
    

    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    2 people found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.