I got the REST API to work on macos. Yeah!!! I could hear the output from the sample code. Alas, now I would like to submit a longer document I wrote to batch TTS and post it as my podcast. I am taking the example right off the webpage, and just substitute the environment variables, the same way that the plain TTS showed it.
curl -v -X PUT -H "Ocp-Apim-Subscription-Key: ${SPEECH_KEY}" -H "Content-Type: application/json" -d '{
"description": "my ssml test",
"inputKind": "SSML",
"inputs": [
{
"content": "<speak version=\"1.0\" xml:lang=\"en-US\"><voice name=\"en-US-JennyNeural\">The rainbow has seven colors.</voice></speak>"
}
],
"properties": {
"outputFormat": "riff-24khz-16bit-mono-pcm",
"wordBoundaryEnabled": false,
"sentenceBoundaryEnabled": false,
"concatenateResult": false,
"decompressOutputFiles": false
}
}' "https://${SPEECH_REGION}.api.cognitive.microsoft.com/texttospeech/batchsyntheses/YourSynthesisId?api-version=2024-04-01"
The first correction I had to make was to add ".api" into the URL, which I would suggest MS also should show. At this point, I had a server to talk to. It would be nice if MS would just use the environment variables here, too.
Now, my first attempt was {"error":{"code":"401","message": "The Create batch text to speech job Operation under Batch Text to Speech API version 2024-04-01 is not supported with the current subscription key and pricing tier SpeechServices.F0."}}
. After some trying to just switch the service, I figured this one out. First, I need to upgrade my entire account in the portal. all good and done, and azure tells me that this has been completed. now I go back to my "resources" in the portal, and then my type "speech service". unfortunately, the Target does not offer me to "move" my basic subscription, but just the free tier. Creating a new "speech services" similarly is confusing --- at the top, I am still in free tier, but at the bottom I can now select standard rather than free. Fortunately, now this works. This process should be a lot clearer.
Next, I was wondering what YourSynthesisId
is? Eventually, I figured out that it is a random (unique) string that the user can choose. So far, so good.
So, now I am at fine-tuning. How can I lower the pitch of the speaker? The speed?
I would like to upload a 20,000 character file (in markdown) now. presumably, I strip the markdown markup first. alas, there were no examples on non-SSML (PlainText) use on the website.
Incidentally, are there more userfriendly interfaces somewhere? like some scripts that process a set of text files and place *wav files next to the .txt files? or, has anyone put this together on a website to resell Azure's TTS services? They are much cheaper than, say, elevenlabs. I wonder if Microsoft would like or not like someone to do this?