Visemes to control the movement of 2D and 3D characters

Nathalie Froissart 25 Reputation points
2023-02-21T07:37:37.3+00:00

Hello! I am trying to find helpful information regarding the use of visemes to control the movement of 2D and 3D avatar model. I am also unsure if I have to create the character myself or if there are some predefined characters I can use. Can someone pleas provide me with more detailed information.

All I can find on Microsoft learn is the headline "Get facial position with viseme". Is there a video how to actually connect a 3d character with the viseme?

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,080 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. YutongTie-MSFT 53,981 Reputation points Moderator
    2023-02-22T00:57:48.83+00:00

    Hello Nathalie Froissart

    Thanks for reaching out to us, and sorry there is no video for how to connect a 3d character with Viseme, there is only a video made to introduce Viseme - https://www.youtube.com/watch?v=ui9XT47uwxs It talks about 3D Viseme from 2:12. Since you mentioned the document, I guess you have viewed this video already - https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-speech-synthesis-viseme?tabs=3dblendshapes&pivots=programming-language-csharp#3d-blend-shapes-animation

    In one sentence, the Azure Text to Speech Visemes features help you to get the voice and Blend Shapes JSON for the visemes timeline, you need to use the result to drive character animation. You can build your own characters and automatically animate them.

    User's image

    Besides the video and document you have mentioned, the character in the sample is from Mixamo - https://www.mixamo.com/#/

    For the 3D Visemes, the most important thing is blend shapes event, you can use blend shapes to drive the facial movements of a 3D character that you designed when you get the viseme output.

    The blend shapes JSON string is represented as a 2-dimensional matrix. Each row represents a frame. Each frame (in 60 FPS) contains an array of 55 facial positions, this is how - https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-speech-synthesis-viseme?tabs=3dblendshapes&pivots=programming-language-csharp#get-viseme-events-with-the-speech-sdk

    More information about Azure Viseme - https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup-structure#viseme-element

    Though is no entire sample for how to do so, I have found a external example you may want to refer to - https://www.linkedin.com/pulse/azure-lip-sync-3d-model-animation-curtesy-amazon-peter-laker?trk=pulse-article_more-articles_related-content-card The author built the animation with vary cloud services include Azure.

    This is the Github Repo - https://github.com/ProjectPete/amazon-sumerian-hosts/blob/mainline/examples/three-azure-v2.html

    Sorry for the lacking of document, I hope those information helps. Please let me know if you need more information on any point, we are happy to help to get more information from product team side.

    Regards,

    Yutong

    -Please kindly accept the answer if you feel helpful to support the community, thanks a lot.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.