@stefan moisei Thanks for the details. Audio Content Creation tool http://speech.microsoft.com/audiocontentcreation, if you haven’t used it already. It is an UI allowing you to generate audio from text and use SSML (including phoneme) to tune the content.
Sox is another option, The document references for SoX.
To have multiple voices and creating different voices, we can use the Custom Voice.