Is there any AI service or API which can drive digital human, e.g., audio2face or audio2gesture ?

Huiyuan SUN 0 Reputation points
2023-06-06T09:15:40.01+00:00

Is there any AI service or API which can drive digital human, e.g., audio2face or audio2gesture ?

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,954 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. YutongTie-MSFT 53,241 Reputation points
    2023-06-07T19:52:12.2866667+00:00

    Hello @Huiyuan SUN

    Thanks for reaching out to us, yes, there are AI services and APIs that can drive digital humans using audio input. One example is the Azure Cognitive Services Speech Services, which includes a Speech-to-Text API that can transcribe audio input into text. This text can then be used to drive a digital human's facial expressions or gestures.

    Currently, Azure Speech Services supports 2D and 3D animation.

    For 2D characters, you can design a character that suits your scenario and use Scalable Vector Graphics (SVG) for each viseme ID to get a time-based face position.

    You can use blend shapes to drive the facial movements of a 3D character that you designed.

    The blend shapes JSON string is represented as a 2-dimensional matrix. Each row represents a frame. Each frame (in 60 FPS) contains an array of 55 facial positions.

    For more information about how to do it in Azure speech Services, please refer to below document -

    https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-speech-synthesis-viseme?tabs=visemeid&pivots=programming-language-python

    Another example is the Azure Kinect Sensor SDK, which includes a Body Tracking API that can track a person's movements and gestures in real-time. This information can be used to drive a digital human's movements and gestures.

    There are also third-party tools and platforms that specialize in creating and animating digital humans, such as Reallusion's iClone and Unreal Engine's MetaHuman Creator. These tools often include AI-driven features for generating facial expressions and animations based on audio input.

    I hope this helps.

    Regards,

    Yutong

    -Please kindly accept the answer and vote 'Yes' if you feel helpful to support the community, thanks a lot.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.