Hi James Withers,
In Azure AI Speech's text-to-speech (TTS) service, including the en-US-NovaTurboMultilingualNeural voice, the system is designed to be highly deterministic and consistent, meaning that the same input text with the same configuration will always produce the exact same audio output. Unlike services such as ElevenLabs or OpenAI’s TTS, Azure does not offer a seed parameter or built-in stochasticity to introduce automatic variability between generations. Instead, any variability must be explicitly introduced by the user through SSML (Speech Synthesis Markup Language) by adjusting attributes like pitch, rate, volume, or style manually or programmatically. For example, you can vary the <prosody> settings or experiment with different <express-as> styles supported by the voice to create slight differences in the speech output. If you want each generation to sound slightly different, you would need to implement a method to randomly adjust these SSML parameters for each request. This design choice reflects Azure’s focus on business, accessibility, and production use cases, where reproducibility and control over speech synthesis are prioritized over random variability or creative interpretation. Therefore, while Azure TTS provides powerful tools for customization, it does not natively support automatic variability across calls like a seed parameter would.
I hope this information helps.