speechsdk.SpeechRecognizer only works from ipynb notebook, cancels when run from .py script

Question

I have been closely following MS speech recognition code examples but I ryun into an inconsistency in Azure Speech API behavior...

When I run code below from .ipynb notebook it works and happily churns out recognition results.

import os
import time

# Set environment variables
from dotenv import load_dotenv

env_file = "secrets/.env"
load_dotenv(env_file, override=True)

# Set up config
import azure.cognitiveservices.speech as speechsdk

speech_config = speechsdk.SpeechConfig(
    subscription=os.getenv("SPEECH_KEY"), 
    region=os.getenv("SPEECH_REGION"))
speech_config.speech_recognition_language="en-GB"
speech_config.output_format = speechsdk.OutputFormat.Detailed

audio_config = speechsdk.audio.AudioConfig(filename="processing/actors.wav")

# Create a speech recognizer using the speech config and audio input config
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

global done
done = False

# Handler for stopping event
def on_session_end(evt):
    """
    callback that signals to stop continuous recognition upon receiving an event `evt
    """
    print(f"Session stopped on {evt}")
    # Stop recognition and close properly
    speech_recognizer.stop_continuous_recognition()
    global done
    done = True

speech_recognizer.session_stopped.connect(on_session_end)

# Add printed output to track what happens
speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))

speech_recognizer.start_continuous_recognition()

while not done:
    time.sleep(.5)

If I run the same code as .py script from the command line python -m speech2text the Azure speech recognition API cancels on me before any text is recognized:

SESSION STARTED: SessionEventArgs(session_id=45ebbd28420941bba9b0cf868b7a7160)
CANCELED SpeechRecognitionCanceledEventArgs(session_id=45ebbd28420941bba9b0cf868b7a7160, result=SpeechRecognitionResult(result_id=4b078cacffe9426898c6ff605a8c3d31, text="", reason=ResultReason.Canceled))
Session stopped on SessionEventArgs(session_id=45ebbd28420941bba9b0cf868b7a7160)
SESSION STOPPED SessionEventArgs(session_id=45ebbd28420941bba9b0cf868b7a7160)

Both on Ubuntu machine. In VSCode. Same conda environment. Same python interpreter. Same environment variables. Tried with multiple .wav files with same result.

I am at a loss. Help would be appreciated.

Answer

@van der Laan, Pepijn Thanks for the Details. I am able to execute the code successfully when I run from the .py script as shown below.

User's image

speechsdk.SpeechRecognizer only works from ipynb notebook, cancels when run from .py script

1 answer