code sample for lip sync

bill carter 20 Reputation points
2023-02-23T21:09:55.3833333+00:00

I am working on providing pronunciation teaching with speech service. I need the pronunciation with lip gestures. Can someone provide me the code sample?

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,061 questions
0 comments No comments
{count} votes

Accepted answer
  1. YutongTie-MSFT 53,966 Reputation points Moderator
    2023-02-23T22:11:47.26+00:00

    Hello bill carter

    Thanks for reaching out to us for this question. For 2D characters like lip sync, you can design a character that suits your scenario and use Scalable Vector Graphics (SVG) for each viseme ID to get a time-based face position.  With temporal tags provided by viseme event, these well-designed SVGs will be processed with smoothing modifications, and provide robust animation to the users. For example, below illustration shows a red lip character designed for language learning. Try the red lip animation experience in Bing Translator, and learn more about how visemes are used to demonstrate the correct pronunciations for words.

    ![thumbnail image 2 of blog post titled

    						Azure Neural Text-to-Speech extended to support lip sync with viseme
    						
    					
    				
    		
    	
    
    		
    

    ](/api/attachments/4e50a9cb-1aa4-4024-8ef9-5184f9ad14c8?platform=QnA)

    Please refer to this document - https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-speech-synthesis-viseme?pivots=programming-language-csharp&tabs=2dsvg#viseme-id

    To get viseme with your synthesized speech, subscribe to the VisemeReceived event in the Speech SDK.

    Note

    To request SVG or blend shapes output, you should use the mstts:viseme element in SSML. For details, see how to use viseme element in SSML.

    The following snippet shows how to subscribe to the viseme event:

    using (var synthesizer = new SpeechSynthesizer(speechConfig, audioConfig))
    {
        // Subscribes to viseme received event
        synthesizer.VisemeReceived += (s, e) =>
        {
            Console.WriteLine($"Viseme event received. Audio offset: " +
                $"{e.AudioOffset / 10000}ms, viseme id: {e.VisemeId}.");
            // `Animation` is an xml string for SVG or a json string for blend shapes
            var animation = e.Animation;
        };
        // If VisemeID is the only thing you want, you can also use `SpeakTextAsync()`
        var result = await synthesizer.SpeakSsmlAsync(ssml);
    }
    

    Here's an example of the viseme output.

    The SVG output is an xml string that contains the animation. Render the SVG animation along with the synthesized speech to see the mouth movement.

    <svg width= "1200px" height= "1200px" ..>
      <g id= "front_start" stroke= "none" stroke-width= "1" fill= "none" fill-rule= "evenodd">
        <animate attributeName= "d" begin= "d_dh_front_background_1_0.end" dur= "0.27500
        ...
    
    

    After you obtain the viseme output, you can use these events to drive character animation. You can build your own characters and automatically animate them.

    Though is no entire sample for how to do so, I have found a external example you may want to refer to - https://www.linkedin.com/pulse/azure-lip-sync-3d-model-animation-curtesy-amazon-peter-laker?trk=pulse-article_more-articles_related-content-card The author built the animation with vary cloud services include Azure.

    This is the Github Repo - https://github.com/ProjectPete/amazon-sumerian-hosts/blob/mainline/examples/three-azure-v2.html

    Regards,

    Yutong

    -Please kindly accept the answer if you feel helpful to support the community, thanks a lot.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.