Is there any way to dub audios maintaining its original intonation, breaks and speed?

Lucas 0 Reputation points

I've a voice audio that has a lot of deeper and higher tones and some breaks and "word-emphasis" in specific moments, but, when using the "Speech Translation" functionality, this audio loses all of its life (all this complexity), becoming a blatant mono-tone IA generated voice.

I've seen that in the "Audio Content Creation" functionality its possible to fine tune and "correct" this problem (adding breaks, intonations and reducing/increasing speed), but, as far as I'm aware, it's still a process that would require manual work in each individual phrase (or word), becoming unbearable for long voice-audios.

Having said that, here are my questions:

Is there any way to directly dub my voice audio maintaining its tones, breaks and speed rate?

If that's not possible, are there any options that do not require "manual fine tuning"?
Maybe some sort of Speech-to-Text that transcribes the original audio informing its <contour>,<breaks>, etc; so that, later, this "well-informed-transcription" could be used as a basis for generating voices in other languages, still maintaining all of the "life" of the original voice.

My main concern is in not being dependent of manual adjustments, if there's anything that could solve this, it's sufficient.

Thanks in advance and hope to get any solution here.

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,431 questions
{count} votes