Wav audio comes back from tts sometimes give me corrupted audio file that can't be played

LeetGPT 20 Reputation points

Hey team, I've been using azure TTS like the following, however I found that sometimes the wav file come back from the service is corrupted and not playable. I've attached one of the audio file I got and I keep in the memory from azure tts.

https://storage.googleapis.com/leetgpt-audio/clrpzssxg00047kglu71ywayr/cls40f9o80003n4qd9ck4n09f/assistant_2024-02-02_05-27-38/0.wav I can confirm this has nothing to do with gcs upload I am using since I download the file locally and it's also not playable. I can't attach the original wav file here since format is not allowed. I also noticed a increasing number of client errors from my tts service metrics dashboard, not sure if this is related to the issue. Also this behavior is quite flaky so 60% of the time, the returned audio file is playable. Could someone help me take a look and let me know what's the issue is all about?

import azure.cognitiveservices.speech as speechsdk
import os
import uuid

class AzureTTS:
    # https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=tts#voice-styles-and-roles
    # https://speech.microsoft.com/portal/1e44aaf2148347e5a53c696ab0175042/voicegallery
    # Default: en-US-BrianNeural
    # Indian: mr-IN-ManoharNeural
    # Chinese: zh-CN-YunxiNeural
    # HR-default: en-US-AvaNeural
    def __init__(self,
        # Initialize the speech configuration using environment variables
        self.speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'), 

        # Create a synthesizer with no audio config to use for preconnecting
        self.synthesizer = speechsdk.SpeechSynthesizer(self.speech_config, audio_config=None)

        self.on_audio = on_audio
        self.on_completion = on_completion

        # Preconnect
        self.connection = speechsdk.Connection.from_speech_synthesizer(self.synthesizer)

    def synthesize(self, 
                   text: str):
        if not text or len(text) == 0:

        # Start text-to-speech process
        synthesis_future = self.synthesizer.start_speaking_text_async(text)
        result = synthesis_future.get()

        audio_data_stream = speechsdk.AudioDataStream(result)
        id = uuid.uuid4()

    def __speech_synthesizer_synthesis_completed_cb(self, evt: speechsdk.SessionEventArgs):
        Callback that signals the event: synthesis completed.
        It returns the audio duration of the synthesized speech.
        if self.on_completion is not None:

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,291 questions
{count} votes

1 answer

Sort by: Most helpful
  1. navba-MSFT 13,850 Reputation points Microsoft Employee

    @LeetGPT Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

    . Please enable the Speech SDK logging as shown below:

    speech_config.set_property(speechsdk.PropertyId.Speech_LogFilename, "LogfilePathAndName")

    This should provide more details for the cause of this issue. More info here: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-use-logging#sample . Also confirm there are no network level issues like proxy / firewall within your corporate network which might be intermittently causing the socket connections to drop. Hope this helps.

    0 comments No comments