@APTENT SOLUCIONES S.L. VAT number B86324258 If you use more than one language as the target language, the audio that is consumed as input in terms of audio hours should count for billing. So, the charge should be $2.5/audio hour and the response if multiple languages are used should be a translation dictionary, where dictionary key is the target translation language, and the value is the translated text.
Since you are using multiple languages as target, manual synthesis needs to be performed on this text or dictionary of text in different languages based on the voice you select for synthesis. This is charged as Text to Speech billing depending on the number of characters synthesized cumulatively. I hope this helps!!
Please see the speech translation document for details on how to use the translation recognizer for text and speech translations.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.