@Siju Vijayan I think you are talking about the custom avatar that is mentioned in the SDK repo. Custom text to speech avatar access is limited based on eligibility and usage criteria. Request access on the intake form. Even with custom avatar, a person's image is required and the avatar will be created based on training data.
The look of the avatar: The custom text to speech avatar looks the same as the avatar talent in the training data, and we don't support customizing the appearance of the avatar model, such as clothes, hairstyle, etc. So if your application requires multiple styles of the same avatar, you should prepare training data for each style, as each style of an avatar will be considered as a single avatar model.
Also, I think the coordinates will be used for video crop if video crop is selected in the custom avatar configuration.
Video Crop - By checking this, you can crop the video stream from server side to a smaller size. This is useful when you want to put the avatar video into a customized rectangle area.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.