Visemes don't match speech for Cantonese Azure text to speech
muxic muxic
0
Reputation points
When trying to use a Cantonese voice e.g. "zh-HK-WanLungNeural" to generate azure speech from text input, e.g. "在美國哪裡可以買到自行車?還有哪裡可以看到蒙娜麗莎?", the length of the visemes generated doesn't match up with the audio. e.g. for 6 seconds of audio, viseme ends at 5 seconds.
Identical code changing the speaker to English or Mandarin with their respective language inputs provides valid visemes.
Visemes are generated in accordance with the docs here: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-speech-synthesis-viseme?tabs=visemeid&pivots=programming-language-python
Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,078 questions
Sign in to answer