How can I add variability to TTS in Azure AI Speech?

James Withers 20 Reputation points
2025-05-04T08:49:28.3566667+00:00

I'm using the en-US-NovaTurboMultilingualNeural voice in Azure AI Speech's text-to-speech (TTS) service.

When I've used other TTS services (e.g. OpenAI, ElevenLabs) each generation of speech results in a slightly different reading. ElevenLabs even has a seed parameter to control this variability in behaviour.

I realise that I can use SSML to customise styles and stress etc, but is there any way (e.g. a seed parameter) to automatically try a different reading of my text input? Or is this TTS service designed to be completely reproducible and controllable?

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,061 questions
{count} votes

Accepted answer
  1. Pavankumar Purilla 8,335 Reputation points Microsoft External Staff Moderator
    2025-05-06T00:30:24.3566667+00:00

    Hi James Withers,

    In Azure AI Speech's text-to-speech (TTS) service, including the en-US-NovaTurboMultilingualNeural voice, the system is designed to be highly deterministic and consistent, meaning that the same input text with the same configuration will always produce the exact same audio output. Unlike services such as ElevenLabs or OpenAI’s TTS, Azure does not offer a seed parameter or built-in stochasticity to introduce automatic variability between generations. Instead, any variability must be explicitly introduced by the user through SSML (Speech Synthesis Markup Language) by adjusting attributes like pitch, rate, volume, or style manually or programmatically. For example, you can vary the <prosody> settings or experiment with different <express-as> styles supported by the voice to create slight differences in the speech output. If you want each generation to sound slightly different, you would need to implement a method to randomly adjust these SSML parameters for each request. This design choice reflects Azure’s focus on business, accessibility, and production use cases, where reproducibility and control over speech synthesis are prioritized over random variability or creative interpretation. Therefore, while Azure TTS provides powerful tools for customization, it does not natively support automatic variability across calls like a seed parameter would.

    I hope this information helps.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.