Wav audio comes back from tts sometimes give me corrupted audio file that can't be played

LeetGPT 60 Reputation points
2024-02-02T05:45:35.1366667+00:00

Hey team, I've been using azure TTS like the following, however I found that sometimes the wav file come back from the service is corrupted and not playable. I've attached one of the audio file I got and I keep in the memory from azure tts.

https://storage.googleapis.com/leetgpt-audio/clrpzssxg00047kglu71ywayr/cls40f9o80003n4qd9ck4n09f/assistant_2024-02-02_05-27-38/0.wav I can confirm this has nothing to do with gcs upload I am using since I download the file locally and it's also not playable. I can't attach the original wav file here since format is not allowed. I also noticed a increasing number of client errors from my tts service metrics dashboard, not sure if this is related to the issue. Also this behavior is quite flaky so 60% of the time, the returned audio file is playable. Could someone help me take a look and let me know what's the issue is all about?

import azure.cognitiveservices.speech as speechsdk
import os
import uuid

class AzureTTS:
    # https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=tts#voice-styles-and-roles
    # https://speech.microsoft.com/portal/1e44aaf2148347e5a53c696ab0175042/voicegallery
    # Default: en-US-BrianNeural
    # Indian: mr-IN-ManoharNeural
    # Chinese: zh-CN-YunxiNeural
    # HR-default: en-US-AvaNeural
    def __init__(self,
                 on_audio,
                 voice='en-US-BrianNeural',
                 on_completion=None):
        # Initialize the speech configuration using environment variables
        self.speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'), 
                                                    region=os.environ.get('SPEECH_REGION'))
        self.speech_config.speech_synthesis_voice_name=voice

        # Create a synthesizer with no audio config to use for preconnecting
        self.synthesizer = speechsdk.SpeechSynthesizer(self.speech_config, audio_config=None)

        self.on_audio = on_audio
        self.on_completion = on_completion
        self.synthesizer.synthesis_completed.connect(self.__speech_synthesizer_synthesis_completed_cb)

        # Preconnect
        self.connection = speechsdk.Connection.from_speech_synthesizer(self.synthesizer)
        self.connection.open(True)

    def synthesize(self, 
                   text: str):
        if not text or len(text) == 0:
            return

        # Start text-to-speech process
        synthesis_future = self.synthesizer.start_speaking_text_async(text)
        result = synthesis_future.get()

        audio_data_stream = speechsdk.AudioDataStream(result)
        id = uuid.uuid4()
        audio_data_stream.save_to_wav_file(f"{id}.wav")
        self.on_audio(f"{id}.wav")

    def __speech_synthesizer_synthesis_completed_cb(self, evt: speechsdk.SessionEventArgs):
        """
        Callback that signals the event: synthesis completed.
        It returns the audio duration of the synthesized speech.
        """
        if self.on_completion is not None:
            self.on_completion(evt.result.audio_duration)

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,555 questions
{count} votes

1 answer

Sort by: Most helpful
  1. navba-MSFT 20,810 Reputation points Microsoft Employee
    2024-02-02T08:03:33.2833333+00:00

    @LeetGPT Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

    . Please enable the Speech SDK logging as shown below:

    speech_config.set_property(speechsdk.PropertyId.Speech_LogFilename, "LogfilePathAndName")
    

    This should provide more details for the cause of this issue. More info here: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-use-logging#sample . Also confirm there are no network level issues like proxy / firewall within your corporate network which might be intermittently causing the socket connections to drop. Hope this helps.

    0 comments No comments