ValueError: cannot construct SpeechConfig with the given arguments
Hi, I'm encountering an error " cannot construct SpeechConfig with the given arguments" after running the program. I'm currently following the steps from an online tutorial for Open AI chatbot using Azure Speech services. However, the above issue keeps popping out. I don't know how to solve it. I'm currently in Singapore and the resource is created in the East US region. Thank you.
Azure AI services
-
navba-MSFT 17,980 Reputation points • Microsoft Employee
2024-05-14T07:39:53.9166667+00:00 @Mikhael Johnson /DS Welcome to Microsoft Q&A Forum, Thank you for posting your query here!
Regarding the error,
ValueError: can't construct SpeechConfig with the given arguments
(or a variation of this message). This error could be observed, for example, when you run one of the Speech SDK for Python quickstarts without setting environment variables. You might also see it when you set the environment variables to something invalid such as your key or region.To resolve this issue, you can try the following steps:
- Ensure that you have set the environment variables correctly. You can refer to the official documentation for setting environment variables.
- Check if you have provided the correct subscription key and region in the environment variables. You can verify the subscription key and region from the Azure portal.
- Ensure that you have the necessary permissions and access rights to the Azure resources. If not, you can request the required permissions from the Azure administrator.
- Check if there are any network connectivity issues. You can try running the program on a different network or VPN.
More info here.
If the above step doesn't help, Please share your sample code with me. I will debug this at my end.
Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.
-
Mikhael Johnson /DS 0 Reputation points
2024-05-14T14:52:01.8566667+00:00 this is the error faced
import azure.cognitiveservices.speech as speechsdk import time from datetime import datetime # from main import settings, already_spoken, output_folder from sounds import play_sound import simpleaudio as sa from dotenv import load_dotenv import os import sounddevice as sd # List the available audio devices and their IDs devices = sd.query_devices() for i, device in enumerate(devices): # print(device) print(f"Device {i}: {device['name']} (ID: {device['index']})") load_dotenv(override=True) settings = { 'speechKey': os.environ.get('SPEECH_KEY'), 'region': os.environ.get('SPEECH_REGION'), 'language': os.environ.get('SPEECH_LANGUAGE'), 'openAIKey': os.environ.get('OPENAI_KEY') } prop = False # Some sounds need to be generated over and over, like "thank you" or "I didn't get that". already_spoken = {} def Start_recording(output_folder): # Creates an instance of a speech config with specified subscription key and service region. speech_config = speechsdk.SpeechConfig( subscription=settings['speechKey'], region=settings['region']) speech_config.request_word_level_timestamps() speech_config.set_property( property_id=speechsdk.PropertyId.SpeechServiceResponse_OutputFormatOption, value="detailed") # Creates a speech recognizer using the default microphone (built-in). audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True) speech_recognizer = speechsdk.SpeechRecognizer( speech_config=speech_config, audio_config=audio_config) # initialize some variables results = [] done = False # update the last time speech was detected. def speech_detected(): nonlocal lastSpoken lastSpoken = int(datetime.now().timestamp() * 1000) # Event handler to add event to the result list def handleResult(evt): import json nonlocal results nonlocal lastSpoken results.append(json.loads(evt.result.json)) # print the result (optional, otherwise it can run for a few minutes without output) # print('RECOGNIZED: {}'.format(evt)) speech_detected() # result object res = {'text': evt.result.test, 'timestamp': evt.result.offset, 'duration': evt.result.duration, 'raw': evt.result} if (evt.result.text != ""): results.append(res) # print(evt.result) # Event handler to check if the recognizer is done def stop_cb(evt): # print('CLOSING on {}'.format(evt)) speech_recognizer.stop_continuous_recognition() nonlocal done done = True # Connect callbacks to the events fired by the speech recognizer & displays the info/status # Ref:https://docs.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.eventsignal?view=azure-python speech_recognizer.recognizing.connect(lambda evt: speech_detected()) # speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt))) speech_recognizer.session_started.connect( lambda evt: print('SESSION STARTED: {}'.format(evt))) speech_recognizer.session_stopped.connect( lambda evt: print('SESSION STOPPED {}'.format(evt))) speech_recognizer.canceled.connect( lambda evt: print('CANCELED {}'.format(evt))) speech_recognizer.recognized.connect(handleResult) speech_recognizer.session_stopped.connect(stop_cb) speech_recognizer.canceled.connect(stop_cb) # Start speech recognition result_future = speech_recognizer.start_continuous_recognition_async() result_future.get() # Play sound to indicate that the recording session is on. play_sound() lastSpoken = int(datetime.now().timestamp() * 1000) # Wait for speech recognition to complete while not done: time.sleep(1) now = int(datetime.now().timestamp() * 1000) inactivity = now - lastSpoken # print(inactivity) # After 1 second of no speech detected, play a sound to indicate the recoding session could close. if (inactivity > 1000): play_sound() if (inactivity > 3000): # Close the recoding session if no input is detected after 3s print('Stopping async recognition.') speech_recognizer.stop_continuous_recognition_async() speak("Thank you!") while not done: time.sleep(1) output = "" for res in results: output += res['NBest'][0]['Display'] return results def speak(text, silent=False, output_folder="./Output"): if text in already_spoken: # if the speech was already synthetized if not silent: play_obj = sa.WaveObject.from_wave_file( already_spoken[text]).play() play_obj.wait_done() return speech_config = speechsdk.SpeechConfig( subscription=settings['speechKey'], region=settings['region']) # audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True) file_name = f'{output_folder}/{datetime.now().strftime("%Y%m%d_%H%M%S")}.wav' audio_config = speechsdk.audio.AudioOutputConfig( use_default_speaker=True, filename=file_name) # The language of the voice that speaks. speech_config.speech_synthesis_voice_name = 'en-US-JennyNeural' speech_synthesizer = speechsdk.SpeechSynthesizer( speech_config=speech_config, audio_config=audio_config) speech_synthesis_result = speech_synthesizer.speak_text(text) # .get() if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted: print("Speech synthesized for text [{}]".format(text)) if not silent: play_obj = sa.WaveObject.from_wave_file(file_name).play() play_obj.wait_done() already_spoken[text] = file_name elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled: cancellation_details = speech_synthesis_result.cancellation_details print("Speech synthesis canceled: {}".format( cancellation_details.reason)) if cancellation_details.reason == speechsdk.CancellationReason.Error: if cancellation_details.error_details: print("Error details: {}".format( cancellation_details.error_details)) print("Did you set the speech resource key and region values?") def speak_ssml(text): speech_config = speechsdk.SpeechConfig( subscription=settings['speechKey'], region=settings['region']) # audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True) # The language of the voice that speaks. speech_config.speech_synthesis_voice_name = 'en-US-JennyNeural' speech_synthesizer = speechsdk.SpeechSynthesizer( speech_config=speech_config, audio_config=None) speech_synthesis_result = speech_synthesizer.speak_ssml( text) # .speak_text(text) #.get() if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted: print("Speech synthesized for text [{}]".format(text)) elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled: cancellation_details = speech_synthesis_result.cancellation_details print("Speech synthesis canceled: {}".format( cancellation_details.reason)) if cancellation_details.reason == speechsdk.CancellationReason.Error: if cancellation_details.error_details: print("Error details: {}".format( cancellation_details.error_details)) print("Did you set the speech resource key and region values?")
above is where i set up config for speech recognizer and called my env varbs
below here is how i set up ,env file, i am not sure if i did it rightSPEECH_KEY=f3ad345440a5ade1a36e65cd25ba SPEECH_REGION=eastus SPEECH_LANGUAGE=en-US OPENAI_KEY=sk-proj-9TNWfO7oxczLtqErqMNsT3BlbkFJ3REBvLS2C4msZ7cqKa8i
-
navba-MSFT 17,980 Reputation points • Microsoft Employee
2024-05-16T04:32:31.1866667+00:00 @Mikhael Johnson /DS Thanks for getting back. Try hardcoding instead of using the OS env variable as shown below and check if that helps.
settings = { 'speechKey': "f3ad345440a5ade1a36e65cd25ba", 'region': "eastus", 'language': "en-US", 'openAIKey': "sk-proj-9TNWfO7oxczLtqErqMNsT3BlbkFJ3REBvLS2C4msZ7cqKa8i" }
Awaiting your reply.
Sign in to comment