Following up to see my above "comment" answer helps by checking the comments section of this thread. Do let us know if you have any queries.
To reiterate the resolution here, let me jot down the gist of my comment answer above.
Yes, it is possible to use both Custom Neural Voice and return viseme data in one API call using the Azure TTS API.
To achieve this, you can use the "outputStyle" parameter in the API call to specify the output format. The "outputStyle" parameter can be set to "riff-16khz-16bit-mono-pcm" for audio output or "raw-16khz-16bit-mono-pcm" for viseme data output.
For a working sample python code and documentation, please refer comments section of this thread.
Please 'Accept as answer' and ‘Upvote’ if it helped so that it can help others in the community looking for help on similar topics.