code sample for lip sync

Question

code sample for lip sync

bill carter 20

I am working on providing pronunciation teaching with speech service. I need the pronunciation with lip gestures. Can someone provide me the code sample?

Accepted answer

0 additional answers

Your answer

Answer 1

Hello bill carter

Thanks for reaching out to us for this question. For 2D characters like lip sync, you can design a character that suits your scenario and use Scalable Vector Graphics (SVG) for each viseme ID to get a time-based face position. With temporal tags provided by viseme event, these well-designed SVGs will be processed with smoothing modifications, and provide robust animation to the users. For example, below illustration shows a red lip character designed for language learning. Try the red lip animation experience in Bing Translator, and learn more about how visemes are used to demonstrate the correct pronunciations for words.

![thumbnail image 2 of blog post titled

						Azure Neural Text-to-Speech extended to support lip sync with viseme

](/api/attachments/4e50a9cb-1aa4-4024-8ef9-5184f9ad14c8?platform=QnA)

Please refer to this document - https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-speech-synthesis-viseme?pivots=programming-language-csharp&tabs=2dsvg#viseme-id

To get viseme with your synthesized speech, subscribe to the VisemeReceived event in the Speech SDK.

Note

To request SVG or blend shapes output, you should use the mstts:viseme element in SSML. For details, see how to use viseme element in SSML.

The following snippet shows how to subscribe to the viseme event:

using (var synthesizer = new SpeechSynthesizer(speechConfig, audioConfig))
{
    // Subscribes to viseme received event
    synthesizer.VisemeReceived += (s, e) =>
    {
        Console.WriteLine($"Viseme event received. Audio offset: " +
            $"{e.AudioOffset / 10000}ms, viseme id: {e.VisemeId}.");
        // `Animation` is an xml string for SVG or a json string for blend shapes
        var animation = e.Animation;
    };
    // If VisemeID is the only thing you want, you can also use `SpeakTextAsync()`
    var result = await synthesizer.SpeakSsmlAsync(ssml);
}

Here's an example of the viseme output.

The SVG output is an xml string that contains the animation. Render the SVG animation along with the synthesized speech to see the mouth movement.

<svg width= "1200px" height= "1200px" ..>
  <g id= "front_start" stroke= "none" stroke-width= "1" fill= "none" fill-rule= "evenodd">
    <animate attributeName= "d" begin= "d_dh_front_background_1_0.end" dur= "0.27500
    ...

After you obtain the viseme output, you can use these events to drive character animation. You can build your own characters and automatically animate them.

Though is no entire sample for how to do so, I have found a external example you may want to refer to - https://www.linkedin.com/pulse/azure-lip-sync-3d-model-animation-curtesy-amazon-peter-laker?trk=pulse-article_more-articles_related-content-card The author built the animation with vary cloud services include Azure.

This is the Github Repo - https://github.com/ProjectPete/amazon-sumerian-hosts/blob/mainline/examples/three-azure-v2.html

Regards,

Yutong

-Please kindly accept the answer if you feel helpful to support the community, thanks a lot.

Share via

code sample for lip sync

0 additional answers

Your answer