Use the Azure Speech-to-Text service and transcribe the audio into text.
then,
Copy the text and paste it in text editor, save it as ( .ssml ) file extension.
then,
Add markup to control the pronunciation, intonation, and other aspects of the synthesized speech, if necessary.