Batch TTS with REST: YourSynthesisId and other intro questions

Question

I got the REST API to work on macos. Yeah!!! I could hear the output from the sample code. Alas, now I would like to submit a longer document I wrote to batch TTS and post it as my podcast. I am taking the example right off the webpage, and just substitute the environment variables, the same way that the plain TTS showed it.

curl -v -X PUT -H "Ocp-Apim-Subscription-Key: ${SPEECH_KEY}" -H "Content-Type: application/json" -d '{

    "description": "my ssml test",

    "inputKind": "SSML",

    "inputs": [

        {

            "content": "The rainbow has seven colors."

        }

    ],

    "properties": {

        "outputFormat": "riff-24khz-16bit-mono-pcm",

        "wordBoundaryEnabled": false,

        "sentenceBoundaryEnabled": false,

        "concatenateResult": false,

        "decompressOutputFiles": false

    }

}'  "https://${SPEECH_REGION}.api.cognitive.microsoft.com/texttospeech/batchsyntheses/YourSynthesisId?api-version=2024-04-01"

The first correction I had to make was to add ".api" into the URL, which I would suggest MS also should show. At this point, I had a server to talk to. It would be nice if MS would just use the environment variables here, too.

Now, my first attempt was {"error":{"code":"401","message": "The Create batch text to speech job Operation under Batch Text to Speech API version 2024-04-01 is not supported with the current subscription key and pricing tier SpeechServices.F0."}} . After some trying to just switch the service, I figured this one out. First, I need to upgrade my entire account in the portal. all good and done, and azure tells me that this has been completed. now I go back to my "resources" in the portal, and then my type "speech service". unfortunately, the Target does not offer me to "move" my basic subscription, but just the free tier. Creating a new "speech services" similarly is confusing --- at the top, I am still in free tier, but at the bottom I can now select standard rather than free. Fortunately, now this works. This process should be a lot clearer.

Next, I was wondering what YourSynthesisId is? Eventually, I figured out that it is a random (unique) string that the user can choose. So far, so good.

So, now I am at fine-tuning. How can I lower the pitch of the speaker? The speed?

I would like to upload a 20,000 character file (in markdown) now. presumably, I strip the markdown markup first. alas, there were no examples on non-SSML (PlainText) use on the website.

Incidentally, are there more userfriendly interfaces somewhere? like some scripts that process a set of text files and place *wav files next to the .txt files? or, has anyone put this together on a website to resell Azure's TTS services? They are much cheaper than, say, elevenlabs. I wonder if Microsoft would like or not like someone to do this?

Accepted Answer

@ivo welch Have you tried the speech studio portal? If you login with the same account as the one used to create the resource, you should see the audio content creation tile under TTS section which should allow you to generate an audio file for upto 20000 characters as a text file or SSML format file.

It also allows you to try different voices with modulation. This page usually helps to get the right voice settings required for any automation requests that can be used in future requests, including an option to use the SSML format or elements. See this page, if you haven't come across before. I hope this helps!!

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

Batch TTS with REST: YourSynthesisId and other intro questions

0 additional answers