The voice is not clear when text is converted to speech

Jeremy Blume 1 Reputation point

In Microsoft text to speech, there is a Chinese model "Yunxi",When the speed is adjusted to 1.2 times, there will be noisy or unclear sound. It will be clearer when the audio is at the normal speaking speed. You can use the text test(1.2 times the speaking speed):黎辉深夜外出寻找乔昭,表现出的关心,让黎皎心里彻底扭曲,连弟弟也开始关心黎三了?现在就黎光文还没回来,全家上下,最不靠谱的那个。这时,青筠火急火燎的跑了进来,不得了啦,大老爷被锦鳞卫押送回府了。听到锦鳞卫三个字, 一屋子人脸色腾的变了。当一家子人胆战心惊走到院子里,黎光文已经走了过来,邓老夫人左右环顾,老大,锦鳞卫呢?你跟娘说实话,咱没犯事吧,黎光文努力回忆,没啊,

Pay attention to "当一家子人“”没啊“

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,272 questions
{count} votes

1 answer

Sort by: Most helpful
  1. YutongTie-MSFT 43,076 Reputation points

    Hello @Jeremy Blume

    I hope my commemt catch your issue well, we have checked the video of the translation, we found that it works well on our side, only at low quality 8kHZ the word "当一家子人“”没啊“ returns with some electric noises, which is around normal at this quality and this speed.

    But I have forwarded this issue to product group about when video speed is up, the quality of the output video is not good. I hope this will be improved in the future.

    Thanks for reporting this gain and I hope my answer is helpful.


    0 comments No comments