Hi @Eugene Pugin , Thanks for using Microsoft Q&A Platform.
I don't believe Azure Text-to-Speech (TTS) includes a mechanism for evaluating the produced synthesis to the input text. But Azure TTS has a feature called "pronunciation assessment" that can be used to verify whether the produced synthesis matches the provided input text. The Pronunciation Assessment measures three aspects of pronunciation: accuracy, fluency, and completeness. It can detect errors such as extra, missing, or repeated words when compared to the reference text. This information helps obtain more accurate scoring to be used as diagnosis information.
To use this feature, you must provide Azure TTS with both the input text and the reference audio file. Then it returns a score based on the quality of the synthesized speech's pronunciation. This can assist you in making sure that your application's voice guidance is accurate and of the best quality.
Please refer to this pronunciation assessment use cases page to see if that fits your requirement. There are some limitations please read this page.
You can try out pronunciation assessment in speech studio.
I hope this helps.
Regards,
Vasavi