Hello bill carter
Thanks for reaching out to us for this question. For 2D characters like lip sync, you can design a character that suits your scenario and use Scalable Vector Graphics (SVG) for each viseme ID to get a time-based face position. With temporal tags provided by viseme event, these well-designed SVGs will be processed with smoothing modifications, and provide robust animation to the users. For example, below illustration shows a red lip character designed for language learning. Try the red lip animation experience in Bing Translator, and learn more about how visemes are used to demonstrate the correct pronunciations for words.

Please refer to this document - https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-speech-synthesis-viseme?pivots=programming-language-csharp&tabs=2dsvg#viseme-id
To get viseme with your synthesized speech, subscribe to the VisemeReceived event in the Speech SDK.
Note
To request SVG or blend shapes output, you should use the mstts:viseme element in SSML. For details, see how to use viseme element in SSML.
The following snippet shows how to subscribe to the viseme event:
using (var synthesizer = new SpeechSynthesizer(speechConfig, audioConfig))
{
// Subscribes to viseme received event
synthesizer.VisemeReceived += (s, e) =>
{
Console.WriteLine($"Viseme event received. Audio offset: " +
$"{e.AudioOffset / 10000}ms, viseme id: {e.VisemeId}.");
// `Animation` is an xml string for SVG or a json string for blend shapes
var animation = e.Animation;
};
// If VisemeID is the only thing you want, you can also use `SpeakTextAsync()`
var result = await synthesizer.SpeakSsmlAsync(ssml);
}
Here's an example of the viseme output.
The SVG output is an xml string that contains the animation. Render the SVG animation along with the synthesized speech to see the mouth movement.
<svg width= "1200px" height= "1200px" ..>
<g id= "front_start" stroke= "none" stroke-width= "1" fill= "none" fill-rule= "evenodd">
<animate attributeName= "d" begin= "d_dh_front_background_1_0.end" dur= "0.27500
...
After you obtain the viseme output, you can use these events to drive character animation. You can build your own characters and automatically animate them.
Though is no entire sample for how to do so, I have found a external example you may want to refer to - https://www.linkedin.com/pulse/azure-lip-sync-3d-model-animation-curtesy-amazon-peter-laker?trk=pulse-article_more-articles_related-content-card The author built the animation with vary cloud services include Azure.
This is the Github Repo - https://github.com/ProjectPete/amazon-sumerian-hosts/blob/mainline/examples/three-azure-v2.html
Regards,
Yutong
-Please kindly accept the answer if you feel helpful to support the community, thanks a lot.