Speech-To-Text sends back recognised text every 30s, how can I extend this duration?

Ye Yutong 0 Reputation points
2023-03-09T04:42:26.55+00:00

I am using Microsoft.CognitiveServices.Speech library on Unity.

Desired behaviour: When button is clicked, if speech haven't start, start speech to text recognition. If speech has started, stop speech to text recognition.

Actual behaviour: When speech is longer than 30s, message only shows up to 30s of what was being spoken. When the next 30s is recognised, the new 30s message replaces the first 30s message. The message only stores 30s worth of message every time, when the button has not been pressed again to stop the speech to text. The speech to text is successful and not cancelled, as the result.Reason always shows RecognizedSpeech.

How can I allow speech of more than 30s to be recognized at one go?

using UnityEngine;
using UnityEngine.UI;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using TMPro;
public class SpeechToText : MonoBehaviour
{
    public TextMeshProUGUI outputText;
    public Button startRecordButton;
    SpeechRecognizer recognizer;
    SpeechConfig speechConfig;
    AudioConfig audioConfig;
    private object threadLocker = new object();
    private bool speechStarted = false;
    public string message;
    public bool sentenceIsRecognized = false;
    private void RecognizedHandler(object sender, SpeechRecognitionEventArgs e)
    {
        lock (threadLocker)
        {
            message = e.Result.Text;
            Debug.Log("threadlocker message :" + message);
            Debug.Log("Cancellation reason:" + e.Result.Reason);
        }
        
    }
    public async void ButtonClick()
    {
        if (ChatGPTTester.executeCount >= 2) {
            if (speechStarted)
            {
                await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
                lock(threadLocker)
                {
                    speechStarted = false;
                    Debug.Log("STT message :" + message);
                    
                    sentenceIsRecognized = true;
                }
            }
            else
            {
                await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);
                lock (threadLocker)
                {
                    speechStarted = true;
                }
            }
        }
    }
    void Start()
    {
        startRecordButton.onClick.AddListener(ButtonClick);
        speechConfig = SpeechConfig.FromSubscription("", "");
        speechConfig.SetProperty(PropertyId.Speech_SegmentationSilenceTimeoutMs, "4500");
        audioConfig = AudioConfig.FromDefaultMicrophoneInput();
        recognizer = new SpeechRecognizer(speechConfig, audioConfig);
        recognizer.Recognized += RecognizedHandler;
    }
    void Update()
    {
        lock (threadLocker)
        {
            if (outputText != null)
            {
                outputText.text = message;
            }
        }
    }
}
Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,373 questions
C#
C#
An object-oriented and type-safe programming language that has its roots in the C family of languages and includes support for component-oriented programming.
10,173 questions
{count} votes