Share via

Interactive/Live Avatar not Showing Hand Movements

DARSHIL SHAH7 80 Reputation points
2026-04-16T05:46:52.4433333+00:00

I recently trained and deployed a Custom Avatar using Azure Speech Studio. I had trained the avatar using videos in which there was significant hand movements.
To test the avatar, I first checked it in the Text To Speech Avatar section where I gave a script as input and the generated a video of my custom avatar reciting that script; showing hand movements as well. This worked perfectly.

Now, when I tried to test it for Live/Interactive Avatar, I noticed that the avatar showed no hand movements at all, as opposed to the same avatar being used in Text To Speech Avatar section showing significant hand movements.
The avatar would be standing straight and rigid, with no movement except the slight lip movement it performed trying to lip sync. Even the lip sync was not as good as compared to the Text to Speech Section -Script Based Avatar.
I even tried the same test out with prebuilt avatars provided by azure- Lisa, Harry.
But even they should the same results, where they would be standing perfectly still and would not be displaying any movement.

To further test it, I even tried to set up a custom UI where I would call the avatar and set the "amplitude" parameter (responsible for determining how much movement the avatar will display) to the maximum value (1) but still I observed no change.

Why is there no hand movements being displayed?
The same avatars being used in Text to Speech Section (Script Based Avatars) display proper hand movements.
The custom avatar was even trained on videos which had significant hand movements.
How to enable these hand movements in Interactive/Live avatar?

Just to be clear, I am not talking about gestures here, but rather hand movements which the avatar automatically performs without the user having to manually insert them.

Azure Speech in Foundry Tools

Answer accepted by question author

SRILAKSHMI C 19,005 Reputation points Microsoft External Staff Moderator
2026-04-16T17:18:59.76+00:00

Hello DARSHIL SHAH7,

Thank you for reaching out and for providing detailed observations,

Based on your description, the behavior you are seeing is expected with the current Live/Interactive Avatar capability.

There is a fundamental difference between:

Text-to-Speech (script-based) Avatar

Live / Interactive Avatar

In the Text-to-Speech Avatar (script-based) flow:

The system uses an offline rendering pipeline

It processes the full script in advance

This allows it to generate richer animations, including:

  • Natural hand movements
  • Body gestures
  • More refined lip synchronization

In contrast, the Live / Interactive Avatar:

Is designed for real-time streaming scenarios

Operates under low-latency constraints

Currently supports Facial animation (lip sync, limited head movement) only

At this time, automatic hand or full-body movements are not supported in the Live pipeline.

Clarifications

This behavior is consistent across:

  • Custom avatars
  • Prebuilt avatars (e.g., Lisa, Harry)

The amplitude parameter:

  • Affects facial expressiveness
  • Applies mainly to photo-based avatars
  • Is not effective for video-based avatars, including custom avatars and built-in video avatars

Training your avatar with videos containing hand movements:

  • Improves visual quality and identity
  • Does not enable gesture reproduction in real-time mode

Workarounds

For scenarios requiring natural hand gestures, use the Text-to-Speech Avatar workflow

In Live scenarios:

  • You may use manual gesture controls via SDK
  • There is currently no automatic hand movement capability

Please refer this

Custom Avatar overview & creation: https://learn.microsoft.com/azure/ai-services/speech-service/text-to-speech-avatar/custom-avatar-create?pivots=ai-foundry-portal

Voice Live API / real-time avatar guide: https://learn.microsoft.com/azure/ai-services/speech-service/voice-live-how-to

Real-time synthesis (scene/amplitude notes): https://learn.microsoft.com/azure/ai-services/speech-service/text-to-speech-avatar/real-time-synthesis-avatar#set-video-resolution

I Hope this helps. Do let me know if you have any further queries.

Thank you!

Was this answer helpful?

1 person found this answer helpful.
0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.