ValueError: cannot construct SpeechConfig with the given arguments

Mikhael Johnson /DS 0

Hi, I'm encountering an error " cannot construct SpeechConfig with the given arguments" after running the program. I'm currently following the steps from an online tutorial for Open AI chatbot using Azure Speech services. However, the above issue keeps popping out. I don't know how to solve it. I'm currently in Singapore and the resource is created in the East US region. Thank you.

navba-MSFT 17,980 Reputation points Microsoft Employee

2024-05-14T07:39:53.9166667+00:00
@Mikhael Johnson /DS Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

Regarding the error, ValueError: can't construct SpeechConfig with the given arguments (or a variation of this message). This error could be observed, for example, when you run one of the Speech SDK for Python quickstarts without setting environment variables. You might also see it when you set the environment variables to something invalid such as your key or region.

To resolve this issue, you can try the following steps:

Ensure that you have set the environment variables correctly. You can refer to the official documentation for setting environment variables.

Check if you have provided the correct subscription key and region in the environment variables. You can verify the subscription key and region from the Azure portal.

Ensure that you have the necessary permissions and access rights to the Azure resources. If not, you can request the required permissions from the Azure administrator.

Check if there are any network connectivity issues. You can try running the program on a different network or VPN.

More info here.

If the above step doesn't help, Please share your sample code with me. I will debug this at my end.

Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

Mikhael Johnson /DS 0

User's image

this is the error faced

import azure.cognitiveservices.speech as speechsdk
import time
from datetime import datetime
# from main import settings, already_spoken, output_folder
from sounds import play_sound
import simpleaudio as sa
from dotenv import load_dotenv
import os
import sounddevice as sd

# List the available audio devices and their IDs
devices = sd.query_devices()
for i, device in enumerate(devices):
    # print(device)
    print(f"Device {i}: {device['name']} (ID: {device['index']})")


load_dotenv(override=True)

settings = {
    'speechKey': os.environ.get('SPEECH_KEY'),
    'region': os.environ.get('SPEECH_REGION'),
    'language': os.environ.get('SPEECH_LANGUAGE'),
    'openAIKey': os.environ.get('OPENAI_KEY')
}

prop = False

# Some sounds need to be generated over and over, like "thank you" or "I didn't get that".
already_spoken = {}


def Start_recording(output_folder):

    # Creates an instance of a speech config with specified subscription key and service region.
    speech_config = speechsdk.SpeechConfig(
        subscription=settings['speechKey'], region=settings['region'])

    speech_config.request_word_level_timestamps()
    speech_config.set_property(
        property_id=speechsdk.PropertyId.SpeechServiceResponse_OutputFormatOption, value="detailed")

    # Creates a speech recognizer using the default microphone (built-in).
    audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)

    speech_recognizer = speechsdk.SpeechRecognizer(
        speech_config=speech_config, audio_config=audio_config)

    # initialize some variables
    results = []
    done = False

    # update the last time speech was detected.
    def speech_detected():
        nonlocal lastSpoken
        lastSpoken = int(datetime.now().timestamp() * 1000)

    # Event handler to add event to the result list
    def handleResult(evt):
        import json
        nonlocal results
        nonlocal lastSpoken
        results.append(json.loads(evt.result.json))

        # print the result (optional, otherwise it can run for a few minutes without output)
        # print('RECOGNIZED: {}'.format(evt))
        speech_detected()

        # result object
        res = {'text': evt.result.test, 'timestamp': evt.result.offset,
               'duration': evt.result.duration, 'raw': evt.result}

        if (evt.result.text != ""):
            results.append(res)

            # print(evt.result)

    # Event handler to check if the recognizer is done

    def stop_cb(evt):
        # print('CLOSING on {}'.format(evt))
        speech_recognizer.stop_continuous_recognition()
        nonlocal done
        done = True

    # Connect callbacks to the events fired by the speech recognizer & displays the info/status
    # Ref:https://docs.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.eventsignal?view=azure-python
    speech_recognizer.recognizing.connect(lambda evt: speech_detected())
    # speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
    speech_recognizer.session_started.connect(
        lambda evt: print('SESSION STARTED: {}'.format(evt)))
    speech_recognizer.session_stopped.connect(
        lambda evt: print('SESSION STOPPED {}'.format(evt)))
    speech_recognizer.canceled.connect(
        lambda evt: print('CANCELED {}'.format(evt)))
    speech_recognizer.recognized.connect(handleResult)
    speech_recognizer.session_stopped.connect(stop_cb)
    speech_recognizer.canceled.connect(stop_cb)

    # Start speech recognition
    result_future = speech_recognizer.start_continuous_recognition_async()
    result_future.get()

    # Play sound to indicate that the recording session is on.
    play_sound()

    lastSpoken = int(datetime.now().timestamp() * 1000)

    # Wait for speech recognition to complete
    while not done:
        time.sleep(1)
        now = int(datetime.now().timestamp() * 1000)
        inactivity = now - lastSpoken
        # print(inactivity)
        # After 1 second of no speech detected, play a sound to indicate the recoding session could close.
        if (inactivity > 1000):
            play_sound()
        if (inactivity > 3000):  # Close the recoding session if no input is detected after 3s
            print('Stopping async recognition.')
            speech_recognizer.stop_continuous_recognition_async()
            speak("Thank you!")
            while not done:
                time.sleep(1)

    output = ""
    for res in results:
        output += res['NBest'][0]['Display']

    return results


def speak(text, silent=False, output_folder="./Output"):

    if text in already_spoken:  # if the speech was already synthetized
        if not silent:
            play_obj = sa.WaveObject.from_wave_file(
                already_spoken[text]).play()
            play_obj.wait_done()
        return

    speech_config = speechsdk.SpeechConfig(
        subscription=settings['speechKey'], region=settings['region'])
    # audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
    file_name = f'{output_folder}/{datetime.now().strftime("%Y%m%d_%H%M%S")}.wav'
    audio_config = speechsdk.audio.AudioOutputConfig(
        use_default_speaker=True, filename=file_name)

    # The language of the voice that speaks.
    speech_config.speech_synthesis_voice_name = 'en-US-JennyNeural'

    speech_synthesizer = speechsdk.SpeechSynthesizer(
        speech_config=speech_config, audio_config=audio_config)

    speech_synthesis_result = speech_synthesizer.speak_text(text)  # .get()

    if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
        print("Speech synthesized for text [{}]".format(text))
        if not silent:
            play_obj = sa.WaveObject.from_wave_file(file_name).play()
            play_obj.wait_done()
        already_spoken[text] = file_name
    elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = speech_synthesis_result.cancellation_details
        print("Speech synthesis canceled: {}".format(
            cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            if cancellation_details.error_details:
                print("Error details: {}".format(
                    cancellation_details.error_details))
                print("Did you set the speech resource key and region values?")


def speak_ssml(text):
    speech_config = speechsdk.SpeechConfig(
        subscription=settings['speechKey'], region=settings['region'])
    # audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)

    # The language of the voice that speaks.
    speech_config.speech_synthesis_voice_name = 'en-US-JennyNeural'

    speech_synthesizer = speechsdk.SpeechSynthesizer(
        speech_config=speech_config, audio_config=None)

    speech_synthesis_result = speech_synthesizer.speak_ssml(
        text)  # .speak_text(text) #.get()

    if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
        print("Speech synthesized for text [{}]".format(text))
    elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = speech_synthesis_result.cancellation_details
        print("Speech synthesis canceled: {}".format(
            cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            if cancellation_details.error_details:
                print("Error details: {}".format(
                    cancellation_details.error_details))
                print("Did you set the speech resource key and region values?")

above is where i set up config for speech recognizer and called my env varbs
below here is how i set up ,env file, i am not sure if i did it right

SPEECH_KEY=f3ad345440a5ade1a36e65cd25ba
SPEECH_REGION=eastus
SPEECH_LANGUAGE=en-US
OPENAI_KEY=sk-proj-9TNWfO7oxczLtqErqMNsT3BlbkFJ3REBvLS2C4msZ7cqKa8i

navba-MSFT 17,980 Reputation points Microsoft Employee

2024-05-16T04:32:31.1866667+00:00
@Mikhael Johnson /DS Thanks for getting back. Try hardcoding instead of using the OS env variable as shown below and check if that helps.

settings = { 'speechKey': "f3ad345440a5ade1a36e65cd25ba", 'region': "eastus", 'language': "en-US", 'openAIKey': "sk-proj-9TNWfO7oxczLtqErqMNsT3BlbkFJ3REBvLS2C4msZ7cqKa8i" }

Awaiting your reply.

Share via

ValueError: cannot construct SpeechConfig with the given arguments