Wav audio comes back from tts sometimes give me corrupted audio file that can't be played

Question

Wav audio comes back from tts sometimes give me corrupted audio file that can't be played

LeetGPT 95

Hey team, I've been using azure TTS like the following, however I found that sometimes the wav file come back from the service is corrupted and not playable. I've attached one of the audio file I got and I keep in the memory from azure tts.

https://storage.googleapis.com/leetgpt-audio/clrpzssxg00047kglu71ywayr/cls40f9o80003n4qd9ck4n09f/assistant_2024-02-02_05-27-38/0.wav I can confirm this has nothing to do with gcs upload I am using since I download the file locally and it's also not playable. I can't attach the original wav file here since format is not allowed. I also noticed a increasing number of client errors from my tts service metrics dashboard, not sure if this is related to the issue. Also this behavior is quite flaky so 60% of the time, the returned audio file is playable. Could someone help me take a look and let me know what's the issue is all about?

import azure.cognitiveservices.speech as speechsdk
import os
import uuid

class AzureTTS:
    # https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=tts#voice-styles-and-roles
    # https://speech.microsoft.com/portal/1e44aaf2148347e5a53c696ab0175042/voicegallery
    # Default: en-US-BrianNeural
    # Indian: mr-IN-ManoharNeural
    # Chinese: zh-CN-YunxiNeural
    # HR-default: en-US-AvaNeural
    def __init__(self,
                 on_audio,
                 voice='en-US-BrianNeural',
                 on_completion=None):
        # Initialize the speech configuration using environment variables
        self.speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'), 
                                                    region=os.environ.get('SPEECH_REGION'))
        self.speech_config.speech_synthesis_voice_name=voice

        # Create a synthesizer with no audio config to use for preconnecting
        self.synthesizer = speechsdk.SpeechSynthesizer(self.speech_config, audio_config=None)

        self.on_audio = on_audio
        self.on_completion = on_completion
        self.synthesizer.synthesis_completed.connect(self.__speech_synthesizer_synthesis_completed_cb)

        # Preconnect
        self.connection = speechsdk.Connection.from_speech_synthesizer(self.synthesizer)
        self.connection.open(True)

    def synthesize(self, 
                   text: str):
        if not text or len(text) == 0:
            return

        # Start text-to-speech process
        synthesis_future = self.synthesizer.start_speaking_text_async(text)
        result = synthesis_future.get()

        audio_data_stream = speechsdk.AudioDataStream(result)
        id = uuid.uuid4()
        audio_data_stream.save_to_wav_file(f"{id}.wav")
        self.on_audio(f"{id}.wav")

    def __speech_synthesizer_synthesis_completed_cb(self, evt: speechsdk.SessionEventArgs):
        """
        Callback that signals the event: synthesis completed.
        It returns the audio duration of the synthesized speech.
        """
        if self.on_completion is not None:
            self.on_completion(evt.result.audio_duration)

LeetGPT 95 Reputation points

2024-02-02T05:53:23.4566667+00:00

This is my client error dashboard from tts.
LeetGPT 95 Reputation points

2024-02-02T05:54:24.08+00:00

This is my client error screenshot from tts service.
navba-MSFT 27,550 Reputation points Microsoft Employee Moderator

2024-02-05T04:48:26.78+00:00

@LeetGPT Just following up to check if my suggestion helped. Please let me know if you have any further queries. I would be happy to help.

1 answer

Your answer

LeetGPT 95 Reputation points

2024-02-02T05:53:23.4566667+00:00

This is my client error dashboard from tts.
LeetGPT 95 Reputation points

2024-02-02T05:54:24.08+00:00

This is my client error screenshot from tts service.
navba-MSFT 27,550 Reputation points Microsoft Employee Moderator

2024-02-05T04:48:26.78+00:00

@LeetGPT Just following up to check if my suggestion helped. Please let me know if you have any further queries. I would be happy to help.

Answer 1

@LeetGPT Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

. Please enable the Speech SDK logging as shown below:

speech_config.set_property(speechsdk.PropertyId.Speech_LogFilename, "LogfilePathAndName")

This should provide more details for the cause of this issue. More info here: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-use-logging#sample . Also confirm there are no network level issues like proxy / firewall within your corporate network which might be intermittently causing the socket connections to drop. Hope this helps.

Share via

Wav audio comes back from tts sometimes give me corrupted audio file that can't be played

1 answer

Your answer