Thanks for your answer @Ramr-msft . I am a developer for a company and we indeed want to animate 3d characters. We also need realtime viseme.
To be more precise, we are looking to something like this :
{"time":0,"type":"sentence","start":0,"end":23,"value":"Mary had a little lamb."}
{"time":6,"type":"word","start":0,"end":4,"value":"Mary"}
{"time":6,"type":"viseme","value":"p"}
{"time":73,"type":"viseme","value":"E"}
{"time":180,"type":"viseme","value":"r"}
{"time":292,"type":"viseme","value":"i"}
...
Your solution to get facial pose events may interest us but we need it in more than just en-US-AriaNeural voice. Specially, french voice is important for us.
Hence my question about an approximate date for this functionality in other languages
Thanks,
gma