Have you tried using the "break" tag? https://learn.microsoft.com/en-us/azure/cognitive-services/Speech-Service/speech-synthesis-markup-structure#add-a-break
Like it says in the "Add Silence" section:
One of the differences between
mstts:silence
andbreak
is that abreak
element can be inserted anywhere in the text. Silence only works at the beginning or end of input text or at the boundary of two adjacent sentences.