Azure AI Text-to-Speech Python SDK - Status code 404 - environment variables are set - what am I doing wrong?

ilikeusingazure 20 Reputation points
2024-03-11T22:04:16.9966667+00:00

Everyone forgive me, this is my first attempt to use an Azure service.

I'm a little disappointed because doing the same thing (successfully) with AWS's Polly service took me 10 minutes max.

So I spent the last couple of hours trying to get this very basic text-to-speech example to work and failed so far.

I did have the SPEECH_KEY and SPEECH_REGION environment variables set up right from the beginning, so this answer is probably not the solution for me. I obtained the SPEECH_KEY from the Keys and Endpoint section of the Azure AI Speech Services resource I created for this. SpeechConfig wants a subscription key and/or an auth_token (???), and/or a region and/or the endpoint (???)...the docs are not really precise.

So here is what happens when I use the exact code from the very basic text-to-speech example:

(azurevenv) PS C:\Users\the_user\azuretest> setx SPEECH_REGION northeurope

SUCCESS: Specified value was saved.
(azurevenv) PS C:\Users\the_user\azuretest> setx SPEECH_KEY [redacted]

SUCCESS: Specified value was saved.
(azurevenv) PS C:\Users\the_user\azuretest> python .\exact_example.py
Traceback (most recent call last):
  File "C:\Users\the_user\azuretest\exact_example.py", line 5, in <module>
    speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'), region=os.environ.get('SPEECH_REGION'))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\the_user\azuretest\azurevenv\Lib\site-packages\azure\cognitiveservices\speech\speech.py", line 84, in __init__
    raise ValueError(generic_error_message)
ValueError: cannot construct SpeechConfig with the given arguments

I made progress but it won't give me anything but a 404.

My current iteration of the very basic text-to-speech example doesn't complain about my SpeechConfig anymore, the output looks like this (full code below):

(azurevenv) PS C:\Users\the_user\azuretest> python .\cog_serv_speech_tts.py
Enter the text to be speech-synthesised: Hello there, why do you give me 404?
Speech synthesis canceled: CancellationReason.Error
Error details: TTS request failed: Internal service error (404). Error Details:  Resource Not Found Please check request details.
Did you set the speech resource key and region values?
Script ran standalone and was not imported.


I tried using the endpoint instead of the region. And I tried to use the SPEECH_KEY as SpeechConfig's subscription key and auth_key. Nothing worked. I also followed this troubleshooting guide and I did get the OAuthToken; however, I could not follow the rest of the troubleshooting guide because PowerShell couldn't find the 2 seconds long audio file in the $pwd I was working in.

Following the speech sythesis example from Github makes me think that me being a noob is the problem. What exactly is SpeechConfig's subscription key? Am I wrong to use one of the two keys from the Keys and Endpoint section of the Azure AI Speech Services resource?

I have no idea what I'm doing wrong. Can anyone help?

Here is the code belonging to the error (404) output above:

(azurevenv) PS C:\Users\the_user\azuretest> cat .\last_iteration_of_speech_tts_test.py
# pip install azure-cognitiveservices-speech
# https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-started-text-to-speech
# expects SPEECH_KEY
# set|setx SPEECH_KEY the_resource_key from Keys and Endpoint section
import os
import azure.cognitiveservices.speech as speechsdk
def main():
    # ValueError: cannot construct SpeechConfig with both region and endpoint or host information
    # ValueError: either subscription key or authorization token must be given along with a region
    # https://learn.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.speech?view=azure-python
    speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'), endpoint="https://northeurope.api.cognitive.microsoft.com/")
    audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
    # https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=tts
    speech_config.speech_synthesis_voice_name='en-US-JennyNeural'
    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
    TEXT = input("Enter the text to be speech-synthesised: ")
    speech_synthesis_result = speech_synthesizer.speak_text_async(TEXT).get()
    if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
        print("Speech synthesized for text [{}]".format(TEXT))
    elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = speech_synthesis_result.cancellation_details
        print("Speech synthesis canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            if cancellation_details.error_details:
                print("Error details: {}".format(cancellation_details.error_details))
                print("Did you set the speech resource key and region values?")
if __name__ == "__main__":
    main()
Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,378 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. dupammi 5,735 Reputation points Microsoft Vendor
    2024-03-12T09:55:32.2333333+00:00

    Hi @ilikeusingazure

    Thank you for using the Microsoft Q&A forum.

    It seems that the SpeechConfig object does not directly expose the constructed endpoint URL attribute. Instead, it constructs the endpoint internally based on the provided subscription key and region.

    To print the constructed endpoint URL, you can create the endpoint URL manually based on the provided region. Here's how you can modify the code to print the constructed endpoint URL based on the provided subscription key and region:

    import azure.cognitiveservices.speech as speechsdk
    def main():
        subscription_key = "YOUR_SPEECH_KEY"
        region = "YOUR_SPEECH_REGION"
        # Construct endpoint URL based on the provided region
        endpoint_url = "https://" + region + ".api.cognitive.microsoft.com/sts/v1.0/issuetoken"
        # Print the constructed endpoint URL
        print("Constructed Endpoint URL:", endpoint_url)
        # Rest of your code.
        speech_config = speechsdk.SpeechConfig(subscription=subscription_key, endpoint=endpoint_url)
        audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
        speech_config.speech_synthesis_voice_name = 'en-US-JennyMultilingualNeural'
        speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
        TEXT = input("Enter the text to be speech-synthesized: ")
        speech_synthesis_result = speech_synthesizer.speak_text_async(TEXT).get()
        if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
            print("Speech synthesized for text [{}]".format(TEXT))
        elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled:
            cancellation_details = speech_synthesis_result.cancellation_details
            print("Speech synthesis canceled: {}".format(cancellation_details.reason))
            if cancellation_details.reason == speechsdk.CancellationReason.Error:
                if cancellation_details.error_details:
                    print("Error details: {}".format(cancellation_details.error_details))
                    print("Did you set the speech resource key and region values?")
    if __name__ == "__main__":
        main()
    
    

    Output:

    enter image description here

    I also tried setting the subscription key and region. Below is the repro I tried at my end, and it was working as expected.

    import os
    import azure.cognitiveservices.speech as speechsdk
    def main():
        # ValueError: cannot construct SpeechConfig with both region and endpoint or host information
        # ValueError: either subscription key or authorization token must be given along with a region
        # https://learn.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.speech?view=azure-python
        speech_config = speechsdk.SpeechConfig(subscription="YOUR_SPEECH_KEY", 
                                                        region="YOUR_SPEECH_REGION")
        audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
        # https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=tts
        speech_config.speech_synthesis_voice_name='en-US-JennyMultilingualNeural'
        speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
        TEXT = input("Enter the text to be speech-synthesised: ")
        speech_synthesis_result = speech_synthesizer.speak_text_async(TEXT).get()
        if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
            print("Speech synthesized for text [{}]".format(TEXT))
        elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled:
            cancellation_details = speech_synthesis_result.cancellation_details
            print("Speech synthesis canceled: {}".format(cancellation_details.reason))
            if cancellation_details.reason == speechsdk.CancellationReason.Error:
                if cancellation_details.error_details:
                    print("Error details: {}".format(cancellation_details.error_details))
                    print("Did you set the speech resource key and region values?")
    if __name__ == "__main__":
        main()
    

    Output:

    enter image description here

    I hope you understand. Thank you.

    1 person found this answer helpful.
    0 comments No comments

  2. ilikeusingazure 20 Reputation points
    2024-03-12T12:53:25.53+00:00

    @dupammi you've just become my personal hero. It works. I suspected that I should try some endpoints from the REST API docs.

    Are all Azure-related Python SDKs so bad? It really appears like a 90% completed product, 60% documented and then abandoned. How is anyone expected to work efficiently with something like that? Would you recommend to avoid Azure Python SDK and use REST APIs instead?

    The question I have is why it displays an wrong incomplete endpoint description in Azure Portal (or perhaps I don't understand how things are meant yet?!) and why the SDK cannot construct the required endpoint properly:

    Screenshot 2024-03-12 133913