Share via

Custom Avatar Quality not Upto the Mark

DARSHIL SHAH7 60 Reputation points
2026-03-27T09:49:29.7133333+00:00

I recently create my own custom avatar using Azure Speech Studio.
Even with the training video recordings being done in a studio with a professional camera and a green screen background, when an Avatar video (batch) is generated, the hands often become invisible/merge with the background when moving away from the body.
This happens not just when the hand movements are fast, but also when they are slow.
As soon as the hands move out of the body border, they become invisible/merged with the background
I am not understanding why this is happening even though the quality of training video was good.
Could you guide me on what the possible issue may be and how to resolve this?

Azure AI Speech
Azure AI Speech

An Azure service that integrates speech processing into apps and services.

0 comments No comments

Answer accepted by question author
  1. Anshika Varshney 9,655 Reputation points Microsoft External Staff Moderator
    2026-04-05T17:59:37.8433333+00:00

    Hi DARSHIL SHAH7,
    What you are seeing with the hands disappearing is a known behavior with Azure Custom Avatars, and it is not caused by camera quality or studio setup.

    Azure Custom Avatars are mainly optimized for face and upper body talking scenarios. The avatar model learns a limited body area from the training videos. When hands move outside that learned body area, the model is not able to track them properly. Because of this, hands can appear invisible or blend into the background, even when the movement is slow and the green screen quality is good.

    Here are a few things that usually help improve the result.

    Try to keep hand movements close to the torso. Avoid wide arm extensions or gestures that move far away from the body. The avatar works best when gestures stay within a consistent and narrow range.

    During training, include controlled hand movement videos on purpose. Make sure some clips show hands moving left, right, slightly above the shoulders, and slightly below the waist. At the same time, make sure hands stay fully inside the camera frame and do not touch the edges.

    Framing is important. Capture the full upper body with enough space on both sides, so hands never go near the frame boundary. Hands close to the edge are more likely to disappear.

    Lighting and background consistency also matter. Use even lighting, avoid shadows on hands, reduce green screen spill, and avoid reflective clothing or accessories. These things help the model separate hands from the background more accurately.

    Keep gestures slow and smooth. Sudden or large movements make it harder for the avatar model to maintain stable hand tracking.

    It is also important to set expectations correctly. Azure Custom Avatars focus on facial realism, not full body or skeletal realism. Wide gestures, detailed hand tracking, and occlusion handling have known limitations and are expected at the current stage.

    I hope this helps. Please feel free to let me know if you have any further questions or need clarification on any of the points above.

    Thankyou!

    1 person found this answer helpful.

Answer accepted by question author
  1. Sina Salam 28,361 Reputation points Volunteer Moderator
    2026-03-30T01:04:42.6133333+00:00

    Hello DARSHIL SHAH7,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that your Custom Avatar Quality not Upto the Mark.

    The issue isn’t caused by video quality; it results from Azure Avatar model limits handling limb movement outside its learned body zone. The only dependable approach is to retrain with controlled gesture ranges and keep motions inside model‑safe spatial boundaries. Reference: Azure AI Speech Avatars - https://learn.microsoft.com/azure/ai-services/speech-avatar/overview

    Follow the below steps as-is to resolve the issue:

    1. Keep hand motions within torso width and avoid extending arms far from the body silhouette. This matches the model’s supported posture range. Model guidelines: https://learn.microsoft.com/azure/ai-services/speech-avatar/concepts/avatar-studio
    2. Include training clips showing hands left, right, above shoulders, and below waist, while maintaining a consistent background and avoiding frame edges. Training guidance: https://learn.microsoft.com/azure/ai-services/speech-avatar/concepts/custom-avatar
    3. Use uniform lighting, avoid shadows, eliminate green-screen spill, and prevent reflective surfaces. This strengthens segmentation accuracy. Lighting & capture requirements: https://learn.microsoft.com/azure/ai-services/speech-avatar/how-to/preparation
    4. Ensure at least 20–30% of your dataset features controlled hand movement, not only static talking posture to help the model learn arm behavior effectively.
    5. Do not perform abrupt movements from torso to wide positions. Use slow, gradual gestures so the model can maintain stable limb representation.
    6. Capture full upper body with enough side margin so hands never touch frame edges. Camera framing reference: https://learn.microsoft.com/azure/ai-services/speech-avatar/how-to/capture
    7. Azure Custom Avatars focus on facial realism, not full skeletal realism meaning hand detail, occlusion handling, and wide gestures will always have limitations. Model scope: https://learn.microsoft.com/azure/ai-services/speech-avatar/concepts/limitations

    I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.