I'm writing a validator of SSML files, which should support those created by the Microsoft Speech Studio. However, it seems that the generated SSML doesn't meet the specification.
For example, when I add an intonation contour to a piece of text, the generated
<prosody> block looks like
<prosody contour="(40%, +48%) (80%, -54%)">This is a test</prosody>.
According to the W3C Recommendation, the
contour attribute should be a set of space-separated items in format
(time_position,target) without any spaces between the parentheses. The SSML documentation of the Microsoft Speech service says the same.
The additional space matters, because the contour attribute is defined as a list of contour points in the W3C XSD schema. In XML, a list is always separated by spaces, this behavior cannot be changed in any way. Therefore, my validator parses the contour attribute as a list of four items
-54%) instead of just two items
Is this a bug of the Speech Studio and can I expect it to be fixed anytime soon? Or does Microsoft have their own XSD schemas, which could possibly also define the
mstts extensions (such as mstts:express-as)?