Text to Speech numbers Normalization rules

Kirill Kholodilin 1 Reputation point
2021-10-14T10:56:56.753+00:00

Hi all!

I have a sentence to be translated into speech:

Insgesamt wurde laut Landesamt im Nordosten bisher bei 45646 Menschen eine Corona-Infektion nachgewiesen, 43609 Menschen gelten als genesen.

When Azure TTS reads this text in German, the first number is read as a normal number(Fünf­und­vierzig­tausend­sechs­hundert­sechs­und­vierzig), and the second one as a set of digits (vier-drei-sechs-null-neun).

What are the rules for numbers normalization in general, why is the first number read normally, and the second isn't?

EDIT: I could reproduce the same behavior in English:

According to the state office in the northeast, a total of 45646 people have so far been found to have a corona infection, 43609 are considered to have recovered.

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,402 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,393 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. GiftA-MSFT 11,151 Reputation points
    2021-10-14T20:47:56.027+00:00

    Hi, I'm not able to reproduce this issue for TTS. When using the demo sample page, I'm getting 'drei­und­vierzig­tausend­sechs­hundert­neun' for '43609'. If you're still getting inconsistent results, please share a sample of your request so we can investigate further. Thanks!

    140659-image.png


    --- *Kindly Accept Answer if the information helps. Thanks.*


  2. GiftA-MSFT 11,151 Reputation points
    2021-10-26T18:10:38.387+00:00

    Hi, following up. TTS digit reading is context related. The machine will make "smart" decision based on context. It will not be 100% correct. However, you can fix such issue by SSML say-as element.


    --- *Kindly Accept Answer if the information helps. Thanks.*

    0 comments No comments