How to not playback the sound of Azure Speech Synthesis

Fabian Lechner 0 Reputation points
2023-04-25T16:59:19.14+00:00

Hello,

i have the following python function that synthezises text into an array to be spoken later on. However teh function itself allways speaks the text. Is there a way to tell the function not to use the audio output and just synthezise it in silence?

def synthesize_speech_to_array(text, subscription_key, region):
    # Configure the Azure Text to Speech instance
    speech_config = speechsdk.SpeechConfig(subscription=subscription_key, region=region)
    audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)

    # Synthesize text to speech
    print("speaking")
    result = speech_synthesizer.speak_text_async(text).get()
    print("stopped")
    # Convert the resulting audio to a NumPy array and normalize the samples
    wav_data = result.audio_data
    wavarr = np.frombuffer(wav_data, dtype=np.int16).astype(np.float32) / 32767.0

    return wavarr
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,645 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Alonso Carrasco Gonzalez 16 Reputation points
    2023-05-29T15:08:08.16+00:00

    I am working on javascript and this worked for me. Instead of passing the audio config i just passed a null type like this

    const synthesizer = new SpeechSDK.SpeechSynthesizer(speechConfig, null);
    

    As you are working with python i guess it would look something like this:

    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=None)
    

    and the rest should be the same. Tell us if it worked.

    3 people found this answer helpful.
    0 comments No comments

  2. romungi-MSFT 43,696 Reputation points Microsoft Employee
    2023-04-26T07:26:58.9733333+00:00

    @Fabian Lechner I think you can use an audio config to synthesize to file instead of using the default speaker on your device. This will ensure the audio is not played but it will be stored in file that you can use to return to the calling method as you are currently passing the audio data back.

    Replace the audio config with this entry and I think you should be good.

    file_name = "outputaudio.wav"
    file_config = speechsdk.audio.AudioOutputConfig(filename=file_name)
    
    

    I hope this helps!!

    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments