Hi, please review the following note in our documentation.
When using the text-to-speech service, you are billed for each character that is converted to speech, including punctuation. While the SSML document itself is not billable, optional elements that are used to adjust how the text is converted to speech, like phonemes and pitch, are counted as billable characters. Here's a list of what's billable:
- Text passed to the text-to-speech service in the SSML body of the request
- All markup within the text field of the request body in the SSML format, except for <speak> and <voice> tags
- Letters, punctuation, spaces, tabs, markup, and all white-space characters
- Every code point defined in Unicode
For detailed information, see Pricing.