@fnx The usage seems correct with respect to the attributes that are supported by Azure text to speech. I think you are not observing a noticeable difference because of the voice that may be used with your testing. I have tested this scenario with the same sentence in the speech studio audio content creation feature. Here are the results for the following SSML inputs.
Normal:
<speak
xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:mstts="http://www.w3.org/2001/mstts"
xmlns:emo="http://www.w3.org/2009/10/emotionml" version="1.0" xml:lang="en-US">
<voice name="Microsoft Server Speech Text to Speech Voice (en-US, ChristopherNeural)">test sentence</voice>
</speak>
With intonation applied:
<speak
xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:mstts="http://www.w3.org/2001/mstts"
xmlns:emo="http://www.w3.org/2009/10/emotionml" version="1.0" xml:lang="en-US">
<voice name="Microsoft Server Speech Text to Speech Voice (en-US, ChristopherNeural)">
<prosody contour="(1%, +85%)">test sentence</prosody>
</voice>
</speak>
Could you please try the scenario with ChristopherNeural with the above SSML inputs? Using the speech studio you can set any of the input parameters by drag and drop instead of manually editing the SSML file. Due to limitation of files that can be attached on this thread, I am unable to upload the audio files for the above inputs. Thanks!!