An Azure service that integrates speech processing into apps and services.
Hello DARSHIL SHAH7,
Thank you for reaching out and for providing detailed observations,
Based on your description, the behavior you are seeing is expected with the current Live/Interactive Avatar capability.
There is a fundamental difference between:
Text-to-Speech (script-based) Avatar
Live / Interactive Avatar
In the Text-to-Speech Avatar (script-based) flow:
The system uses an offline rendering pipeline
It processes the full script in advance
This allows it to generate richer animations, including:
- Natural hand movements
- Body gestures
- More refined lip synchronization
In contrast, the Live / Interactive Avatar:
Is designed for real-time streaming scenarios
Operates under low-latency constraints
Currently supports Facial animation (lip sync, limited head movement) only
At this time, automatic hand or full-body movements are not supported in the Live pipeline.
Clarifications
This behavior is consistent across:
- Custom avatars
- Prebuilt avatars (e.g., Lisa, Harry)
The amplitude parameter:
- Affects facial expressiveness
- Applies mainly to photo-based avatars
- Is not effective for video-based avatars, including custom avatars and built-in video avatars
Training your avatar with videos containing hand movements:
- Improves visual quality and identity
- Does not enable gesture reproduction in real-time mode
Workarounds
For scenarios requiring natural hand gestures, use the Text-to-Speech Avatar workflow
In Live scenarios:
- You may use manual gesture controls via SDK
- There is currently no automatic hand movement capability
Please refer this
Custom Avatar overview & creation: https://learn.microsoft.com/azure/ai-services/speech-service/text-to-speech-avatar/custom-avatar-create?pivots=ai-foundry-portal
Voice Live API / real-time avatar guide: https://learn.microsoft.com/azure/ai-services/speech-service/voice-live-how-to
Real-time synthesis (scene/amplitude notes): https://learn.microsoft.com/azure/ai-services/speech-service/text-to-speech-avatar/real-time-synthesis-avatar#set-video-resolution
I Hope this helps. Do let me know if you have any further queries.
Thank you!