How to save speech to text output using continuous recognition to a file ?

De Wit Wendy 20 Reputation points
2023-02-17T17:09:10.53+00:00

Hi,

I'm using Speech to Text on a *.wav file from within the Azure machine learning studio. Since the wav file contains a couple of minutes speech I'm using continuous recognition. I do get the STT working. However, I only get the result on the screen and don't manage to save it to a file. The file is created, but it is empty.

Can you see what I'm missing here in the recognized_cb function ? My code is written in Python.

def stt_run4(wav_file_path, taal, key, regio, outputfile):
      
    speech_config = speechsdk.SpeechConfig(subscription=key, region=regio)
    speech_config.speech_recognition_language=taal
    audio_config = speechsdk.audio.AudioConfig(filename=wav_file_path)

    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
    
    done = False


    # Set up the output file for the transcript
    output_file = open(outputfile, "w")

    def stop_cb(evt):
        """callback that signals to stop continuous recognition upon receiving an event `evt`"""
        print('CLOSING on {}'.format(evt))
            # Close the output file and stop the continuous recognition session
        output_file.close()
        speech_recognizer.stop_continuous_recognition()
        print("Transcript saved in file:", outputfile)
        nonlocal done
        done = True
        
    def recognized_cb(evt : speechsdk.SpeechRecognitionEventArgs) :
        if speechsdk.ResultReason.RecognizingSpeech == evt.result.reason and len(evt.result.text) > 0 :
            print('RECOGNIZED:', evt.result.text)
            output_file.write(evt.result.text)
            output_file.flush()
            


    # Connect callbacks to the events fired by the speech recognizer
    speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
    speech_recognizer.recognized.connect(recognized_cb)
    speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
    speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
    speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
    # stop continuous recognition on either session stopped or canceled events
    speech_recognizer.session_stopped.connect(stop_cb)
    speech_recognizer.canceled.connect(stop_cb)

    # Start continuous speech recognition
    result=speech_recognizer.start_continuous_recognition()
  
    
    while not done:
        time.sleep(.5)

    
    return 
Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,774 questions
0 comments No comments
{count} votes

Accepted answer
  1. VasaviLankipalle-MSFT 17,641 Reputation points
    2023-02-17T23:46:40.2766667+00:00

    Hi @De Wit Wendy , Thanks for using Microsoft Q&A Platform.

    The catch here is that in the recognized_cb() function if condition "speechsdk.ResultReason.RecognizingSpeech == evt.result.reason" needs to be corrected to "speechsdk.ResultReason.RecognizedSpeech==evt.result.reason"

    evt.result.reason == speechsdk.ResultReason.RecognizedSpeech is used to check if the recognized speech segment is final, means that the recognizer has finished processing that segment of audio and has produced a final recognition result for it.

    evt.result.reason == speechsdk.ResultReason.RecognizingSpeech is used to check if the recognizer is currently processing a speech segment and has produced a partial recognition result for it.

    Please make the following code changes so that you can see the desired results.

    def recognized_cb(evt: speechsdk.SpeechRecognitionEventArgs) :
            if speechsdk.ResultReason.RecognizedSpeech==evt.result.reason and len(evt.result.text) > 0 :
                print('RECOGNIZED:', evt.result.text)
                output_file.write(evt.result.text)
                output_file.flush()
    
    

    Following these changes, your final updated/working code should look like this.

    
    def stt_run4(wav_file_path, taal, key, regio, outputfile):
          
        speech_config = speechsdk.SpeechConfig(subscription=key, region=regio)
        speech_config.speech_recognition_language=taal
        audio_config = speechsdk.audio.AudioConfig(filename=wav_file_path)
    
        speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
        
        done = False
    
    
        # Set up the output file for the transcript
        output_file = open(outputfile, "w")
    
        def stop_cb(evt):
            """callback that signals to stop continuous recognition upon receiving an event `evt`"""
            print('CLOSING on {}'.format(evt))
                # Close the output file and stop the continuous recognition session
            output_file.close()
            speech_recognizer.stop_continuous_recognition()
            print("Transcript saved in file:", outputfile)
            nonlocal done
            done = True
            
        def recognized_cb(evt: speechsdk.SpeechRecognitionEventArgs) :
            if speechsdk.ResultReason.RecognizedSpeech==evt.result.reason and len(evt.result.text) > 0 :
                print('RECOGNIZED:', evt.result.text)
                output_file.write(evt.result.text)
                output_file.flush()
                
    
    
        # Connect callbacks to the events fired by the speech recognizer
        speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
        speech_recognizer.recognized.connect(recognized_cb)
        speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
        speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
        speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
        # stop continuous recognition on either session stopped or canceled events
        speech_recognizer.session_stopped.connect(stop_cb)
        speech_recognizer.canceled.connect(stop_cb)
    
        # Start continuous speech recognition
        speech_recognizer.start_continuous_recognition()
      
        
        while not done:
            time.sleep(.5) 
    
    

    This final code works for me. Please try and let us know how it goes on your end.

    I hope this helps.

    Regards,
    Vasavi

    -Please kindly accept the answer and vote 'Yes if you feel helpful to support the community, thanks.

    3 people found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.