speechsdk.SpeechRecognizer only works from ipynb notebook, cancels when run from .py script

van der Laan, Pepijn 0 Reputation points
2023-07-23T06:29:42.6+00:00

I have been closely following MS speech recognition code examples but I ryun into an inconsistency in Azure Speech API behavior...

When I run code below from .ipynb notebook it works and happily churns out recognition results.

import os
import time

# Set environment variables
from dotenv import load_dotenv

env_file = "secrets/.env"
load_dotenv(env_file, override=True)

# Set up config
import azure.cognitiveservices.speech as speechsdk

speech_config = speechsdk.SpeechConfig(
    subscription=os.getenv("SPEECH_KEY"), 
    region=os.getenv("SPEECH_REGION"))
speech_config.speech_recognition_language="en-GB"
speech_config.output_format = speechsdk.OutputFormat.Detailed

audio_config = speechsdk.audio.AudioConfig(filename="processing/actors.wav")

# Create a speech recognizer using the speech config and audio input config
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

global done
done = False

# Handler for stopping event
def on_session_end(evt):
    """
    callback that signals to stop continuous recognition upon receiving an event `evt
    """
    print(f"Session stopped on {evt}")
    # Stop recognition and close properly
    speech_recognizer.stop_continuous_recognition()
    global done
    done = True

speech_recognizer.session_stopped.connect(on_session_end)

# Add printed output to track what happens
speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))

speech_recognizer.start_continuous_recognition()

while not done:
    time.sleep(.5)

If I run the same code as .py script from the command line python -m speech2text the Azure speech recognition API cancels on me before any text is recognized:

SESSION STARTED: SessionEventArgs(session_id=45ebbd28420941bba9b0cf868b7a7160)
CANCELED SpeechRecognitionCanceledEventArgs(session_id=45ebbd28420941bba9b0cf868b7a7160, result=SpeechRecognitionResult(result_id=4b078cacffe9426898c6ff605a8c3d31, text="", reason=ResultReason.Canceled))
Session stopped on SessionEventArgs(session_id=45ebbd28420941bba9b0cf868b7a7160)
SESSION STOPPED SessionEventArgs(session_id=45ebbd28420941bba9b0cf868b7a7160)

Both on Ubuntu machine. In VSCode. Same conda environment. Same python interpreter. Same environment variables. Tried with multiple .wav files with same result.

I am at a loss. Help would be appreciated.

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,395 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,382 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,616 Reputation points
    2023-07-25T03:55:49.6266667+00:00

    @van der Laan, Pepijn Thanks for the Details. I am able to execute the code successfully when I run from the .py script as shown below.

    User's image