Am I right that the Long Audio API is very clunky and limited when it comes to voice selection? Am I right to eschew/forswear/bag it in favor of using the regular synthesizer to make short files and stitch them together?

Stephen Cummings 6 Reputation points
2021-08-01T16:02:33.057+00:00

Am I right that even if you can configure a bunch of separate speech resources, each for a different region, you still can't access all of the neural voices through Long Audio? In my tests so far the voices returned by get_voices() — I'm on python — are (for each region that returns any) a freakily random set. In a resource configured for the 'centralindia' region I get no Hindi or other Indian voices. At the moment I need Hindi, Mandarin Chinese, Norwegian Bokmal, and English in various flavors.

I should forget Long Audio, right? Or is there a secret set of steps to follow to get to full access to all the voices?

I was very happy with my initial results with the speech synthesizer sdk and am Microsoft-leaning so I haven't investigated the competition yet. Does anyone know whether Polly or Google provide a simpler path to voice/language options during conversion of long text files to speech?

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,555 questions
{count} vote