Is there any way to dub audios maintaining its original intonation, breaks and speed?

Lucas 0

I've a voice audio that has a lot of deeper and higher tones and some breaks and "word-emphasis" in specific moments, but, when using the "Speech Translation" functionality, this audio loses all of its life (all this complexity), becoming a blatant mono-tone IA generated voice.

I've seen that in the "Audio Content Creation" functionality its possible to fine tune and "correct" this problem (adding breaks, intonations and reducing/increasing speed), but, as far as I'm aware, it's still a process that would require manual work in each individual phrase (or word), becoming unbearable for long voice-audios.

Having said that, here are my questions:

Is there any way to directly dub my voice audio maintaining its tones, breaks and speed rate?

If that's not possible, are there any options that do not require "manual fine tuning"?
Maybe some sort of Speech-to-Text that transcribes the original audio informing its <contour>,<breaks>, etc; so that, later, this "well-informed-transcription" could be used as a basis for generating voices in other languages, still maintaining all of the "life" of the original voice.

My main concern is in not being dependent of manual adjustments, if there's anything that could solve this, it's sufficient.

Thanks in advance and hope to get any solution here.

santoshkc 4,925 Reputation points Microsoft Vendor

2024-05-03T09:13:24.55+00:00

Hi @Lucas,

Thank you for reaching out to Microsoft Q&A forum!

I understand that you are looking for a way to preserve the intonation and other nuances of your voice when using the Speech Translation functionality in Azure. While there is no direct way to dub your voice audio while maintaining its tones, breaks, and speed rate, there are a few options that may help you achieve your desired result.

Azure's Speech Services which provides a Text-to-Speech (TTS) feature that can convert text to lifelike speech. The TTS feature supports a wide range of voices and languages, and you can customize the voice's intonation, breaks, and speed.

One option is to use the Speech Synthesis Markup Language (SSML) to fine-tune the pitch, pronunciation, speaking rate, volume, and more in the text-to-speech output.

Other option is to use the Custom Voice service in Azure, which allows you to create a unique, recognizable synthetic voice that represents your brand. With Custom Voice, you can create a voice that sounds like you or someone else, and you can customize the voice's intonation, speaking style, and more. See documentation: Train your professional voice model.

If you're looking for a way to generate voices in other languages while maintaining the intonation and other nuances of the original voice, you may want to consider using the Custom Neural Voice service in Azure.

I hope this helps. Thank you.
santoshkc 4,925 Reputation points Microsoft Vendor

2024-05-06T05:31:53.3666667+00:00

Hi @Lucas,

Following up to see if the given response was helpful. Thank you.
santoshkc 4,925 Reputation points Microsoft Vendor

2024-05-07T04:57:48.79+00:00

Hi @Lucas,

We haven’t heard from you on the last response and was just checking back to see if the given response was helpful. In case if you have any resolution, please do share that same with the community as it can be helpful to others. Thank you.

Share via

Is there any way to dub audios maintaining its original intonation, breaks and speed?