Visemes don't match speech for Cantonese Azure text to speech

muxic muxic 0 Reputation points
2024-08-15T02:21:10.94+00:00

When trying to use a Cantonese voice e.g. "zh-HK-WanLungNeural" to generate azure speech from text input, e.g. "在美國哪裡可以買到自行車?還有哪裡可以看到蒙娜麗莎?", the length of the visemes generated doesn't match up with the audio. e.g. for 6 seconds of audio, viseme ends at 5 seconds.

Identical code changing the speaker to English or Mandarin with their respective language inputs provides valid visemes.

Visemes are generated in accordance with the docs here: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-speech-synthesis-viseme?tabs=visemeid&pivots=programming-language-python

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,078 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.