jakluk asked GiftA-MSFT commented

Validity of SSML generated by the Speech Studio


I'm writing a validator of SSML files, which should support those created by the Microsoft Speech Studio. However, it seems that the generated SSML doesn't meet the specification.

For example, when I add an intonation contour to a piece of text, the generated <prosody> block looks like <prosody contour="(40%, +48%) (80%, -54%)">This is a test</prosody>.

According to the W3C Recommendation, the contour attribute should be a set of space-separated items in format (time_position,target) without any spaces between the parentheses. The SSML documentation of the Microsoft Speech service says the same.

The additional space matters, because the contour attribute is defined as a list of contour points in the W3C XSD schema. In XML, a list is always separated by spaces, this behavior cannot be changed in any way. Therefore, my validator parses the contour attribute as a list of four items (40%, +48%) (80%, -54%) instead of just two items (40%,+48%) (80%,-54%).

Is this a bug of the Speech Studio and can I expect it to be fixed anytime soon? Or does Microsoft have their own XSD schemas, which could possibly also define the mstts extensions (such as mstts:express-as)?

Hi, we will forward your feedback to the product group and share updates soon. Thanks.

