Viseme support for python

Question

Viseme support for python

Sudhir Dass 20

Hi, I want some Viseme support on Python project but I can't find any useful sources.

Also I wonder if you are doing a non-profit project for disablities,is there any discount or benefits from Microsoft

Accepted answer

0 additional answers

Your answer

Answer 1

Hello @Sudhir Dass

Thanks for reaching out to us. Yes of course, Viseme ID supports neural voices in all viseme-supported locales. Scalable Vector Graphics (SVG) only supports neural voices in en-US locale, and blend shapes supports neural voices in en-US and zh-CN locales. You can use visemes to control the movement of 2D and 3D avatar models, so that the facial positions are best aligned with synthetic speech. For example, you can:

Create an animated virtual voice assistant for intelligent kiosks, building multi-mode integrated services for your customers.
Build immersive news broadcasts and improve audience experiences with natural face and mouth movements.
Generate more interactive gaming avatars and cartoon characters that can speak with dynamic content.
Make more effective language teaching videos that help language learners understand the mouth behavior of each word and phoneme.
People with hearing impairment can also pick up sounds visually and "lip-read" speech content that shows visemes on an animated face.

The following snippet shows how to subscribe to the viseme event:

speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)

def viseme_cb(evt):
    print("Viseme event received: audio offset: {}ms, viseme id: {}.".format(
        evt.audio_offset / 10000, evt.viseme_id))

    # `Animation` is an xml string for SVG or a json string for blend shapes
    animation = evt.animation

# Subscribes to viseme received event
speech_synthesizer.viseme_received.connect(viseme_cb)

# If VisemeID is the only thing you want, you can also use `speak_text_async()`
result = speech_synthesizer.speak_ssml_async(ssml).get()

Here's an example of the viseme output.

(Viseme), Viseme ID: 1, Audio offset: 200ms.
(Viseme), Viseme ID: 5, Audio offset: 850ms.
……
(Viseme), Viseme ID: 13, Audio offset: 2350ms.

More information please refer to the document - https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-speech-synthesis-viseme?tabs=visemeid&pivots=programming-language-python

I hope this helps.

Regards,

Yutong

-Please kindly accept the answer and vote 'Yes' if you feel helpful to support the community, thanks a lot.

YutongTie-MSFT 53,966 Reputation points Moderator

2023-06-03T21:21:46.09+00:00

Hello @Sudhir Dass Thanks for reaching out to us again, please let me know if this helps.

Share via

Viseme support for python

0 additional answers

Your answer