Is each voice in the voice gallery based on a clone of one specific natural person or is it synthetic?

Question

I would like to understand whether:

Each voice in the voice gallery is based on a clone of one specific natural person?
Voices are synthetic (similar to those from 11Labs Voice Design) that cannot be traced back to an individual person?

Thank you!

Answer

Hi @mpsb,

Thank you for reaching out to Microsoft Q&A forum!

Voices in the Azure Voice Gallery are primarily synthetic, created using advanced machine learning and AI techniques. While some voices may be based on recordings of a specific natural person's voice to achieve a particular tone or style, the resulting voice models are typically synthetic.

Some voices are created from recordings of a specific person's voice, capturing their unique characteristics but often modified during synthesis.

Other voices are entirely AI-generated, not based on any real person, designed to sound natural but without a real-world counterpart.

In practice, Azure may use both methods depending on the specific requirements and design goals of each voice. Some voices might be designed to offer a high degree of naturalness by starting with recordings of a specific person, while others might be generated entirely synthetically for broader flexibility and privacy considerations.

For more info see: Code of conduct for Azure AI Speech.

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful.

Share via

Is each voice in the voice gallery based on a clone of one specific natural person or is it synthetic?

1 answer