How can I add variability to TTS in Azure AI Speech?

Question

How can I add variability to TTS in Azure AI Speech?

James Withers 20

I'm using the en-US-NovaTurboMultilingualNeural voice in Azure AI Speech's text-to-speech (TTS) service.

When I've used other TTS services (e.g. OpenAI, ElevenLabs) each generation of speech results in a slightly different reading. ElevenLabs even has a seed parameter to control this variability in behaviour.

I realise that I can use SSML to customise styles and stress etc, but is there any way (e.g. a seed parameter) to automatically try a different reading of my text input? Or is this TTS service designed to be completely reproducible and controllable?

James Withers 20 Reputation points

2025-05-04T08:58:34.11+00:00

The main reason I'm using Azure AI Speech is because of the phoneme/viseme ID output, which other services lack, so it's not feasible to move to something else.

Accepted answer

0 additional answers

Your answer

James Withers 20 Reputation points

2025-05-04T08:58:34.11+00:00

The main reason I'm using Azure AI Speech is because of the phoneme/viseme ID output, which other services lack, so it's not feasible to move to something else.

Answer 1

Hi James Withers,

In Azure AI Speech's text-to-speech (TTS) service, including the en-US-NovaTurboMultilingualNeural voice, the system is designed to be highly deterministic and consistent, meaning that the same input text with the same configuration will always produce the exact same audio output. Unlike services such as ElevenLabs or OpenAI’s TTS, Azure does not offer a seed parameter or built-in stochasticity to introduce automatic variability between generations. Instead, any variability must be explicitly introduced by the user through SSML (Speech Synthesis Markup Language) by adjusting attributes like pitch, rate, volume, or style manually or programmatically. For example, you can vary the <prosody> settings or experiment with different <express-as> styles supported by the voice to create slight differences in the speech output. If you want each generation to sound slightly different, you would need to implement a method to randomly adjust these SSML parameters for each request. This design choice reflects Azure’s focus on business, accessibility, and production use cases, where reproducibility and control over speech synthesis are prioritized over random variability or creative interpretation. Therefore, while Azure TTS provides powerful tools for customization, it does not natively support automatic variability across calls like a seed parameter would.

I hope this information helps.

Share via

How can I add variability to TTS in Azure AI Speech?

0 additional answers

Your answer