speechsdk.SpeechRecognizer only works from ipynb notebook, cancels when run from .py script

van der Laan, Pepijn 0 Reputation points
2023-07-23T06:29:42.6+00:00

I have been closely following MS speech recognition code examples but I ryun into an inconsistency in Azure Speech API behavior...

When I run code below from .ipynb notebook it works and happily churns out recognition results.

import os
import time

# Set environment variables
from dotenv import load_dotenv

env_file = "secrets/.env"
load_dotenv(env_file, override=True)

# Set up config
import azure.cognitiveservices.speech as speechsdk

speech_config = speechsdk.SpeechConfig(
    subscription=os.getenv("SPEECH_KEY"), 
    region=os.getenv("SPEECH_REGION"))
speech_config.speech_recognition_language="en-GB"
speech_config.output_format = speechsdk.OutputFormat.Detailed

audio_config = speechsdk.audio.AudioConfig(filename="processing/actors.wav")

# Create a speech recognizer using the speech config and audio input config
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

global done
done = False

# Handler for stopping event
def on_session_end(evt):
    """
    callback that signals to stop continuous recognition upon receiving an event `evt
    """
    print(f"Session stopped on {evt}")
    # Stop recognition and close properly
    speech_recognizer.stop_continuous_recognition()
    global done
    done = True

speech_recognizer.session_stopped.connect(on_session_end)

# Add printed output to track what happens
speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))

speech_recognizer.start_continuous_recognition()

while not done:
    time.sleep(.5)

If I run the same code as .py script from the command line python -m speech2text the Azure speech recognition API cancels on me before any text is recognized:

SESSION STARTED: SessionEventArgs(session_id=45ebbd28420941bba9b0cf868b7a7160)
CANCELED SpeechRecognitionCanceledEventArgs(session_id=45ebbd28420941bba9b0cf868b7a7160, result=SpeechRecognitionResult(result_id=4b078cacffe9426898c6ff605a8c3d31, text="", reason=ResultReason.Canceled))
Session stopped on SessionEventArgs(session_id=45ebbd28420941bba9b0cf868b7a7160)
SESSION STOPPED SessionEventArgs(session_id=45ebbd28420941bba9b0cf868b7a7160)

Both on Ubuntu machine. In VSCode. Same conda environment. Same python interpreter. Same environment variables. Tried with multiple .wav files with same result.

I am at a loss. Help would be appreciated.

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
{count} votes

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,836 Reputation points
    2023-07-25T03:55:49.6266667+00:00

    @van der Laan, Pepijn Thanks for the Details. I am able to execute the code successfully when I run from the .py script as shown below.

    User's image


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.