Can I get viseme data via the Text-to-Speech REST API?

Steffen Schreiber 25 Reputation points
2023-03-17T13:55:28.1933333+00:00

Hello,

I would like to use the REST API via CURL in PHP in order to retrieve speech output as well as the corresponding viseme data.

I can successfully get speech data, but I'm not sure how to get viseme data.

I tried to use the following ssml für getting visemes:

<speak version="1.0" xml:lang="en-US"><voice xml:lang="en-US" xml:gender="Female" name="de-DE-KatjaNeural"><mstts:viseme type="FacialExpression"/>Ich kann sprechen</voice></speak>

Unfortunately this only produces an empty result in my curl request. Do I need to set a specific X-Microsot-OutpurFormat header? And can I get viseme data and sound data in a single call or do these need to be separate (if viseme data is possible at all via the REST API).

Thanks and best regards,

Steffen

Here is my PHP code, which produces an empty response:

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://germanywestcentral.tts.speech.microsoft.com/cognitiveservices/v1");
curl_setopt($ch, CURLOPT_HTTPHEADER, [
	'Content-Type: application/ssml+xml',
	'Ocp-Apim-Subscription-Key: ' . $API_KEY,
    // the following output format works for getting speech data, but not for visemes
    'X-Microsoft-OutputFormat: audio-16khz-128kbitrate-mono-mp3',
	'User-Agent: curl'
]);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt_array($ch, array(
	CURLOPT_POST => 1,
	CURLOPT_POSTFIELDS => '<speak version="1.0" xml:lang="en-US"><voice xml:lang="en-US" xml:gender="Female" name="de-DE-KatjaNeural"><mstts:viseme type="FacialExpression"/>Ich kann sprechen</voice></speak>',
));
// fclose($fp);
$response = curl_exec($ch);
Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,554 questions
{count} votes

1 answer

Sort by: Most helpful
  1. VasaviLankipalle-MSFT 15,946 Reputation points
    2023-03-18T01:25:40.82+00:00

    Hi @Steffen Schreiber , Thanks for using Microsoft Q&A Platform.

    Unfortunately, Viseme events works with speech SDK only not through REST API. The SDK is available in C++, C#, Java, JavaScript, and Python languages. You can refer to the VisemeReceived event in the Speech SDK here: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-speech-synthesis-viseme?tabs=visemeid&pivots=programming-language-python

    I hope this helps. Let me know if you have any questions.

    Regards,
    Vasavi

    -Please kindly accept the answer and vote 'Yes' if you feel helpful to support the community, thanks.

    0 comments No comments