Can I get viseme data via the Text-to-Speech REST API?

Question

Can I get viseme data via the Text-to-Speech REST API?

Steffen Schreiber 25

Hello,

I would like to use the REST API via CURL in PHP in order to retrieve speech output as well as the corresponding viseme data.

I can successfully get speech data, but I'm not sure how to get viseme data.

I tried to use the following ssml für getting visemes:

<speak version="1.0" xml:lang="en-US"><voice xml:lang="en-US" xml:gender="Female" name="de-DE-KatjaNeural"><mstts:viseme type="FacialExpression"/>Ich kann sprechen</voice></speak>

Unfortunately this only produces an empty result in my curl request. Do I need to set a specific X-Microsot-OutpurFormat header? And can I get viseme data and sound data in a single call or do these need to be separate (if viseme data is possible at all via the REST API).

Thanks and best regards,

Steffen

Here is my PHP code, which produces an empty response:

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://germanywestcentral.tts.speech.microsoft.com/cognitiveservices/v1");
curl_setopt($ch, CURLOPT_HTTPHEADER, [
	'Content-Type: application/ssml+xml',
	'Ocp-Apim-Subscription-Key: ' . $API_KEY,
    // the following output format works for getting speech data, but not for visemes
    'X-Microsoft-OutputFormat: audio-16khz-128kbitrate-mono-mp3',
	'User-Agent: curl'
]);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt_array($ch, array(
	CURLOPT_POST => 1,
	CURLOPT_POSTFIELDS => '<speak version="1.0" xml:lang="en-US"><voice xml:lang="en-US" xml:gender="Female" name="de-DE-KatjaNeural"><mstts:viseme type="FacialExpression"/>Ich kann sprechen</voice></speak>',
));
// fclose($fp);
$response = curl_exec($ch);

VasaviLankipalle-MSFT 18,676 Reputation points Moderator

2023-03-21T03:05:19.54+00:00

Hi @Steffen Schreiber , did you get a chance to check my response?

1 answer

Your answer

VasaviLankipalle-MSFT 18,676 Reputation points Moderator

2023-03-21T03:05:19.54+00:00

Hi @Steffen Schreiber , did you get a chance to check my response?

Answer 1

Hi @Steffen Schreiber , Thanks for using Microsoft Q&A Platform.

Unfortunately, Viseme events works with speech SDK only not through REST API. The SDK is available in C++, C#, Java, JavaScript, and Python languages. You can refer to the VisemeReceived event in the Speech SDK here: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-speech-synthesis-viseme?tabs=visemeid&pivots=programming-language-python

I hope this helps. Let me know if you have any questions.

Regards,
Vasavi

-Please kindly accept the answer and vote 'Yes' if you feel helpful to support the community, thanks.

Share via

Can I get viseme data via the Text-to-Speech REST API?

1 answer

Your answer