Why my TTS is suddenly become bad? Speed & punctuation isn't working properly.

etienne Brassard 25 Reputation points

This morning I tried to work on my TTS file using Brian's voice. But once I listened to the speech, the punctuation & speed weren't working properly. Also, it seems that his voice became monotone. I've tried with an already-finished project to see if the problem was the file, but it seems that every file has the same problems.

I compared both versions (the finished/already downloaded project & the current one) and you can clearly see the quality difference between those two. Last week, everything was working properly.


Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,456 questions
{count} votes

Accepted answer
  1. Konstantinos Passadis 17,376 Reputation points MVP

    Hello @etienne Brassard !

    Welcome to Microsoft QnA !

    I have searched if there are any issues on Azure but service seems clean on all regions

    The only thing i found (it is old by it may be the issue) is this on the December Update


    I suggest to also check other Voice just to see if it works and also refer to Rate Limits just in case


    If you cannot find something to help , and changing the Voice is not helping could you provide the code ? Remove sensitive info so we can have a look !

    Please state which SDK are you using ?

    What region ?

    An overall description of the porject would also help !


    I hope this helps!

    Kindly mark the answer as Accepted and Upvote in case it helped!


    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. etienne Brassard 25 Reputation points

    I've tried other voices. For example, Ava's voice seems fine but Andrew has the same issue. I have both versions of the text below and It is much easier to see the problem when listening to both versions to discern the problem.

    My project consists of making explicative videos using a voiceover, and honestly, I'm not too sure how to retrieve the code, but I'll put the SSML.


    My SDK for Azure Speech Studio is the Azure SDK for the Speech service

    Region: East US


    <!--ID=B7267351-473F-409D-9765-754A8EBCDE05;Version=1|{"VoiceNameToIdMapItems":[{"Id":"c15219d0-f768-41d1-99cd-b1ce8e31c565","Name":"Microsoft Server Speech Text to Speech Voice (en-US, BrianNeural)","ShortName":"en-US-BrianNeural","Locale":"en-US","VoiceType":"StandardVoice"}]}-->
    <speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" version="1.0" xml:lang="en-US"><voice name="en-US-BrianNeural" sentenceboundarysilence-exact="300ms" tailingsilence-exact="500ms" commasilence-exact="10ms"><prosody rate="-8.00%">In the past, someone with a $1 million portfolio would easily have enough income to retire. They would have plenty of money to pay for housing, food, healthcare, transportation and take nice vacations when desired. However, $1 million has a fraction of the buying power it had for previous generations. Some people might not even find that it would produce enough income without having to make some major sacrifices, continuing to work to some extent. That's not to say you couldn't retire comfortably with $1 million, but there are some key things to consider.</prosody></voice>
    <voice name="en-US-BrianNeural" sentenceboundarysilence-exact="300ms" tailingsilence-exact="700ms" commasilence-exact="10ms"><prosody rate="-8.00%">It may sounds like a lot of money, but how much income will $1 million portfolio produce? There's no exact science to determine the precise amount of annual income, but experts recommend using a 4% withdrawal rate. By only withdrawing </prosody>4%<prosody rate="-8.00%"> of your portfolio value each year, it's extremely unlikely that you'd ever run out of money. By using the 4% rule, retirees can withdraw</prosody> 4%<prosody rate="-8.00%"> of their total portfolio value the first year during retirement and make adjustments for inflation for each subsequent year. Assuming a 4% safe withdrawal rate, you would receive </prosody>$40000<prosody rate="-8.00%"> per year on income. This guideline is based on historical stock and bond market performance, which is always being updated. Some people recommend using a 3% or even 5% withdrawal rate. Depending on things like the expected length of retirement, you might decide to withdraw a different amount, but 4% is a widely accepted rule of thumb.</prosody></voice></speak>
    0 comments No comments

  2. Alex Orfanoudakis 0 Reputation points

    @Konstantinos Passadis Hello, I'm experiencing the same thing. Just the simple HTTP request with EmmaNeural for example, leads to an artifact of the word 'hi'. And overall style and quality of the voice seems to have degraded starting from about 1-2 months ago.

    curl --location 'https://westus3.tts.speech.microsoft.com/cognitiveservices/v1' \
    --header 'X-Microsoft-OutputFormat: riff-24khz-16bit-mono-pcm' \
    --header 'Content-Type: application/ssml+xml' \
    --header 'Ocp-Apim-Subscription-Key: somekey' \
    --data '<speak version='\''1.0'\'' xml:lang='\''en-US'\''>
        <voice xml:lang='\''en-US'\'' xml:gender='\''Female'\''
            Hi! This is Emma, How can i help today?

    Is this the right forum to raise this?