Azure AI Text-to-Speech Python SDK - Status code 404 - environment variables are set - what am I doing wrong?

Question

Azure AI Text-to-Speech Python SDK - Status code 404 - environment variables are set - what am I doing wrong?

ilikeusingazure 20

Everyone forgive me, this is my first attempt to use an Azure service.

I'm a little disappointed because doing the same thing (successfully) with AWS's Polly service took me 10 minutes max.

So I spent the last couple of hours trying to get this very basic text-to-speech example to work and failed so far.

I did have the SPEECH_KEY and SPEECH_REGION environment variables set up right from the beginning, so this answer is probably not the solution for me. I obtained the SPEECH_KEY from the Keys and Endpoint section of the Azure AI Speech Services resource I created for this. SpeechConfig wants a subscription key and/or an auth_token (???), and/or a region and/or the endpoint (???)...the docs are not really precise.

So here is what happens when I use the exact code from the very basic text-to-speech example:

(azurevenv) PS C:\Users\the_user\azuretest> setx SPEECH_REGION northeurope

SUCCESS: Specified value was saved.
(azurevenv) PS C:\Users\the_user\azuretest> setx SPEECH_KEY [redacted]

SUCCESS: Specified value was saved.
(azurevenv) PS C:\Users\the_user\azuretest> python .\exact_example.py
Traceback (most recent call last):
  File "C:\Users\the_user\azuretest\exact_example.py", line 5, in <module>
    speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'), region=os.environ.get('SPEECH_REGION'))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\the_user\azuretest\azurevenv\Lib\site-packages\azure\cognitiveservices\speech\speech.py", line 84, in __init__
    raise ValueError(generic_error_message)
ValueError: cannot construct SpeechConfig with the given arguments

I made progress but it won't give me anything but a 404.

My current iteration of the very basic text-to-speech example doesn't complain about my SpeechConfig anymore, the output looks like this (full code below):

(azurevenv) PS C:\Users\the_user\azuretest> python .\cog_serv_speech_tts.py
Enter the text to be speech-synthesised: Hello there, why do you give me 404?
Speech synthesis canceled: CancellationReason.Error
Error details: TTS request failed: Internal service error (404). Error Details:  Resource Not Found Please check request details.
Did you set the speech resource key and region values?
Script ran standalone and was not imported.

I tried using the endpoint instead of the region. And I tried to use the SPEECH_KEY as SpeechConfig's subscription key and auth_key. Nothing worked. I also followed this troubleshooting guide and I did get the OAuthToken; however, I could not follow the rest of the troubleshooting guide because PowerShell couldn't find the 2 seconds long audio file in the $pwd I was working in.

Following the speech sythesis example from Github makes me think that me being a noob is the problem. What exactly is SpeechConfig's subscription key? Am I wrong to use one of the two keys from the Keys and Endpoint section of the Azure AI Speech Services resource?

I have no idea what I'm doing wrong. Can anyone help?

Here is the code belonging to the error (404) output above:

(azurevenv) PS C:\Users\the_user\azuretest> cat .\last_iteration_of_speech_tts_test.py
# pip install azure-cognitiveservices-speech
# https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-started-text-to-speech
# expects SPEECH_KEY
# set|setx SPEECH_KEY the_resource_key from Keys and Endpoint section
import os
import azure.cognitiveservices.speech as speechsdk
def main():
    # ValueError: cannot construct SpeechConfig with both region and endpoint or host information
    # ValueError: either subscription key or authorization token must be given along with a region
    # https://learn.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.speech?view=azure-python
    speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'), endpoint="https://northeurope.api.cognitive.microsoft.com/")
    audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
    # https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=tts
    speech_config.speech_synthesis_voice_name='en-US-JennyNeural'
    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
    TEXT = input("Enter the text to be speech-synthesised: ")
    speech_synthesis_result = speech_synthesizer.speak_text_async(TEXT).get()
    if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
        print("Speech synthesized for text [{}]".format(TEXT))
    elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = speech_synthesis_result.cancellation_details
        print("Speech synthesis canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            if cancellation_details.error_details:
                print("Error details: {}".format(cancellation_details.error_details))
                print("Did you set the speech resource key and region values?")
if __name__ == "__main__":
    main()

2 answers

Your answer

Answer 1

Hi @ilikeusingazure

Thank you for using the Microsoft Q&A forum.

It seems that the SpeechConfig object does not directly expose the constructed endpoint URL attribute. Instead, it constructs the endpoint internally based on the provided subscription key and region.

To print the constructed endpoint URL, you can create the endpoint URL manually based on the provided region. Here's how you can modify the code to print the constructed endpoint URL based on the provided subscription key and region:

import azure.cognitiveservices.speech as speechsdk
def main():
    subscription_key = "YOUR_SPEECH_KEY"
    region = "YOUR_SPEECH_REGION"
    # Construct endpoint URL based on the provided region
    endpoint_url = "https://" + region + ".api.cognitive.microsoft.com/sts/v1.0/issuetoken"
    # Print the constructed endpoint URL
    print("Constructed Endpoint URL:", endpoint_url)
    # Rest of your code.
    speech_config = speechsdk.SpeechConfig(subscription=subscription_key, endpoint=endpoint_url)
    audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
    speech_config.speech_synthesis_voice_name = 'en-US-JennyMultilingualNeural'
    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
    TEXT = input("Enter the text to be speech-synthesized: ")
    speech_synthesis_result = speech_synthesizer.speak_text_async(TEXT).get()
    if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
        print("Speech synthesized for text [{}]".format(TEXT))
    elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = speech_synthesis_result.cancellation_details
        print("Speech synthesis canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            if cancellation_details.error_details:
                print("Error details: {}".format(cancellation_details.error_details))
                print("Did you set the speech resource key and region values?")
if __name__ == "__main__":
    main()

Output:

enter image description here

I also tried setting the subscription key and region. Below is the repro I tried at my end, and it was working as expected.

import os
import azure.cognitiveservices.speech as speechsdk
def main():
    # ValueError: cannot construct SpeechConfig with both region and endpoint or host information
    # ValueError: either subscription key or authorization token must be given along with a region
    # https://learn.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.speech?view=azure-python
    speech_config = speechsdk.SpeechConfig(subscription="YOUR_SPEECH_KEY", 
                                                    region="YOUR_SPEECH_REGION")
    audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
    # https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=tts
    speech_config.speech_synthesis_voice_name='en-US-JennyMultilingualNeural'
    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
    TEXT = input("Enter the text to be speech-synthesised: ")
    speech_synthesis_result = speech_synthesizer.speak_text_async(TEXT).get()
    if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
        print("Speech synthesized for text [{}]".format(TEXT))
    elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = speech_synthesis_result.cancellation_details
        print("Speech synthesis canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            if cancellation_details.error_details:
                print("Error details: {}".format(cancellation_details.error_details))
                print("Did you set the speech resource key and region values?")
if __name__ == "__main__":
    main()

Output:

enter image description here

I hope you understand. Thank you.

Answer 2

ilikeusingazure 20

@dupammi you've just become my personal hero. It works. I suspected that I should try some endpoints from the REST API docs.

Are all Azure-related Python SDKs so bad? It really appears like a 90% completed product, 60% documented and then abandoned. How is anyone expected to work efficiently with something like that? Would you recommend to avoid Azure Python SDK and use REST APIs instead?

The question I have is why it displays an ~~wrong~~ incomplete endpoint description in Azure Portal (or perhaps I don't understand how things are meant yet?!) and why the SDK cannot construct the required endpoint properly:

Screenshot 2024-03-12 133913

dupammi 8,615 Reputation points Microsoft External Staff

2024-03-13T06:01:24.8366667+00:00

Hi @ilikeusingazure

I'm glad to hear that my suggestions were helpful.

Regarding your question about Azure-related Python SDKs, I would like to clarify that Microsoft is committed to providing high-quality SDKs for all its services, including Azure. The Python SDKs are actively maintained and updated, and Microsoft encourages developers to use them for building applications on Azure.

Regarding the incomplete endpoint description in the Azure Portal, I would recommend reaching out to Microsoft support for further assistance with this issue.

The earlier response, I converted to answer, which might be beneficial to other community members reading this thread as a solution, in case you'd like to accept the answer. Thank you!

Share via

Azure AI Text-to-Speech Python SDK - Status code 404 - environment variables are set - what am I doing wrong?

2 answers

Your answer