Viseme support for python

Sudhir Dass 20 Reputation points
2023-05-30T16:00:43.8366667+00:00

Hi, I want some Viseme support on Python project but I can't find any useful sources.

Also I wonder if you are doing a non-profit project for disablities,is there any discount or benefits from Microsoft

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,061 questions
0 comments No comments
{count} votes

Accepted answer
  1. YutongTie-MSFT 53,966 Reputation points Moderator
    2023-05-30T16:38:39.0933333+00:00

    Hello @Sudhir Dass

    Thanks for reaching out to us. Yes of course, Viseme ID supports neural voices in all viseme-supported locales. Scalable Vector Graphics (SVG) only supports neural voices in en-US locale, and blend shapes supports neural voices in en-US and zh-CN locales. You can use visemes to control the movement of 2D and 3D avatar models, so that the facial positions are best aligned with synthetic speech. For example, you can:

    • Create an animated virtual voice assistant for intelligent kiosks, building multi-mode integrated services for your customers.
    • Build immersive news broadcasts and improve audience experiences with natural face and mouth movements.
    • Generate more interactive gaming avatars and cartoon characters that can speak with dynamic content.
    • Make more effective language teaching videos that help language learners understand the mouth behavior of each word and phoneme.
    • People with hearing impairment can also pick up sounds visually and "lip-read" speech content that shows visemes on an animated face.

    The following snippet shows how to subscribe to the viseme event:

    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
    
    def viseme_cb(evt):
        print("Viseme event received: audio offset: {}ms, viseme id: {}.".format(
            evt.audio_offset / 10000, evt.viseme_id))
    
        # `Animation` is an xml string for SVG or a json string for blend shapes
        animation = evt.animation
    
    # Subscribes to viseme received event
    speech_synthesizer.viseme_received.connect(viseme_cb)
    
    # If VisemeID is the only thing you want, you can also use `speak_text_async()`
    result = speech_synthesizer.speak_ssml_async(ssml).get()
    
    
    
    

    Here's an example of the viseme output.

    (Viseme), Viseme ID: 1, Audio offset: 200ms.
    (Viseme), Viseme ID: 5, Audio offset: 850ms.
    ……
    (Viseme), Viseme ID: 13, Audio offset: 2350ms.
    

    More information please refer to the document - https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-speech-synthesis-viseme?tabs=visemeid&pivots=programming-language-python

    I hope this helps.

    Regards,

    Yutong

    -Please kindly accept the answer and vote 'Yes' if you feel helpful to support the community, thanks a lot.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.