Can Azure TTS API use both Custom neural voice and Facial position in BlendShapes?

wave test 20 Reputation points
2023-09-23T01:52:40.59+00:00

I know the Azure TTS API can Get facial position with viseme(3d BlendShapes)

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-speech-synthesis-viseme?tabs=visemeid&pivots=programming-language-javascript

,and also use Custom Neural Voice,

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup-voice#use-speaking-styles-and-roles

but is it possible to specify both Custom Neural Voice and return viseme data in one api call?

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,602 questions
{count} votes

Accepted answer
  1. dupammi 8,615 Reputation points Microsoft External Staff
    2023-09-27T03:59:58.1733333+00:00

    @wave test ,

    Following up to see my above "comment" answer helps by checking the comments section of this thread. Do let us know if you have any queries.

    To reiterate the resolution here, let me jot down the gist of my comment answer above.

    Yes, it is possible to use both Custom Neural Voice and return viseme data in one API call using the Azure TTS API.

    To achieve this, you can use the "outputStyle" parameter in the API call to specify the output format. The "outputStyle" parameter can be set to "riff-16khz-16bit-mono-pcm" for audio output or "raw-16khz-16bit-mono-pcm" for viseme data output.

    For a working sample python code and documentation, please refer comments section of this thread.

    Please 'Accept as answer' and ‘Upvote’ if it helped so that it can help others in the community looking for help on similar topics.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.