question

KirillKholodilin-0635 avatar image
0 Votes"
KirillKholodilin-0635 asked GiftA-MSFT answered

Text to Speech numbers Normalization rules

Hi all!

I have a sentence to be translated into speech:

Insgesamt wurde laut Landesamt im Nordosten bisher bei 45646 Menschen eine Corona-Infektion nachgewiesen, 43609 Menschen gelten als genesen.

When Azure TTS reads this text in German, the first number is read as a normal number(Fünf­und­vierzig­tausend­sechs­hundert­sechs­und­vierzig), and the second one as a set of digits (vier-drei-sechs-null-neun).

What are the rules for numbers normalization in general, why is the first number read normally, and the second isn't?

EDIT: I could reproduce the same behavior in English:

According to the state office in the northeast, a total of 45646 people have so far been found to have a corona infection, 43609 are considered to have recovered.



azure-cognitive-servicesazure-speech
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

GiftA-MSFT avatar image
0 Votes"
GiftA-MSFT answered GiftA-MSFT commented

Hi, I'm not able to reproduce this issue for TTS. When using the demo sample page, I'm getting 'drei­und­vierzig­tausend­sechs­hundert­neun' for '43609'. If you're still getting inconsistent results, please share a sample of your request so we can investigate further. Thanks!

140659-image.png



--- Kindly Accept Answer if the information helps. Thanks.



image.png (26.6 KiB)
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thank you for the reply. Indeed, if I use the bare number on the sample demo page, it is pronounced correctly. If I use the exact sentence that I've attached, I still get this mispronunciation. Looks like normalization depends on context here?

Could you please try with the whole sentence:

Insgesamt wurde laut Landesamt im Nordosten bisher bei 45646 Menschen eine Corona-Infektion nachgewiesen, 43609 Menschen gelten als genesen.

Thank you.

140628-screenshot-2021-10-14-at-120828.png


0 Votes 0 ·
GiftA-MSFT avatar image GiftA-MSFT KirillKholodilin-0635 ·

I noticed adding comma after the number gave correct pronunciation. It's unclear what the rules are for German language but I'll inquire from the product group and share details soon. You can also use SSML to improve synthesis using say-as element.



1 Vote 1 ·
GiftA-MSFT avatar image
0 Votes"
GiftA-MSFT answered

Hi, following up. TTS digit reading is context related. The machine will make "smart" decision based on context. It will not be 100% correct. However, you can fix such issue by SSML say-as element.



--- Kindly Accept Answer if the information helps. Thanks.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.