How to get sentence word timestamp results for real-time speech recognition ?

莓草 0

I am using Golang's SDK

this is my golang code

func (m *microsoft) Do(ctx context.Context, path string) (string, error) {
	defer os.Remove(path)
	accessKeyConfig := AccessKeyList[rand.Intn(len(AccessKeyList))]
	subscription := accessKeyConfig.Key
	region := accessKeyConfig.Region
	file := path
	audioConfig, err := audio.NewAudioConfigFromWavFileInput(file)
	if err != nil {
		fwlog.New(ctx).Info("", "audioConfigErr")
		return "", err
	}
	defer audioConfig.Close()
	config, err := speech.NewSpeechConfigFromSubscription(subscription, region)
	if err != nil {
		fwlog.New(ctx).Info("", "configErr")
		return "", err
	}
	config.RequestWordLevelTimestamps()
	defer config.Close()
	speechRecognizer, err := speech.NewSpeechRecognizerFromConfig(config, audioConfig)
	if err != nil {
		fwlog.New(ctx).Info("", "speechRecognizerErr")
		return "", err
	}
	defer speechRecognizer.Close()
	speechRecognizer.SessionStarted(func(event speech.SessionEventArgs) {
		defer event.Close()
		fmt.Println("Session Started (ID=", event.SessionID, ")")
	})
	speechRecognizer.Recognizing(recognizingHandler)
	speechRecognizer.Recognized(recognizedHandler)
	//speechRecognizer.Recognizing(recognizedHandler)
	speechRecognizer.SessionStopped(func(event speech.SessionEventArgs) {
		defer event.Close()
		fmt.Println("Session Stopped (ID=", event.SessionID, ")")
	})

	task := speechRecognizer.RecognizeOnceAsync()
	var outcome speech.SpeechRecognitionOutcome
	select {
	case outcome = <-task:
	case <-time.After(120 * time.Second):
		fmt.Println("Timed out")
		return "", errors.New("Timed out")
	}
	defer outcome.Close()
	defer os.Remove(path)
	if outcome.Error != nil {
		fwlog.New(ctx).Info("", "outcomeErr")
		return "", outcome.Error
	}
	return outcome.Result.Text, nil
}

navba-MSFT 24,910 Reputation points Microsoft Employee

2024-07-05T10:43:46.22+00:00
@莓草 Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

.

While I did some research on this, I see that in JAVA SDK this is how it is done.

.

.

Just following the same approach in the Go SDK too and Could you please check if the below helps ?

jsonText := outcome.Result.Properties.GetProperty(common.SpeechServiceResponseJSONResult, "")

If you have any follow-up questions, please let me know. I would be happy to help.
莓草 0 Reputation points

2024-07-05T10:58:47.6366667+00:00

thank you,it works now,
navba-MSFT 24,910 Reputation points Microsoft Employee

2024-07-05T13:22:53.3133333+00:00

@莓草 For the below answer, Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

1 answer

navba-MSFT 24,910 Reputation points Microsoft Employee

2024-07-05T10:47:04.7+00:00
@莓草 Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

.

While I did some research on this, I see that in JAVA SDK this is how it is done.

.

.

Just following the same approach in the Go SDK too and Could you please check if the below helps ?

jsonText := outcome.Result.Properties.GetProperty(common.SpeechServiceResponseJSONResult, "")

.

If you have any follow-up questions, please let me know. I would be happy to help.
Please sign in to rate this answer.
navba-MSFT 24,910 Reputation points Microsoft Employee

2024-07-05T13:23:28.6533333+00:00

@莓草 For the above answer, Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

莓草 0 Reputation points

2024-07-08T01:46:11.1933333+00:00

How can I specify the language to be recognized when recognizing it,please help me

navba-MSFT 24,910 Reputation points Microsoft Employee

2024-07-08T05:30:28.6466667+00:00

@莓草 Thanks for getting back. To recognize the language, you can make use of SetSpeechRecognitionLanguage which sets the input language to the speech recognizer. More info here.

speechConfig, err := speech.NewSpeechConfigFromSubscription("YourSubscriptionKey", "YourServiceRegion") if err != nil { // Handle error... } speechConfig.SetSpeechRecognitionLanguage("en-US")

** For the above answer, Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

莓草 0 Reputation points

2024-07-08T07:26:51.07+00:00

thank you ,it is works

莓草 0 Reputation points

2024-07-08T07:39:16.2+00:00

The return value seems incorrect; the timestamp doesn't match up. There should be a pause between the timestamps of the previous and following sentences, but instead, they are continuous.

莓草 0 Reputation points

2024-07-08T07:51:53.4566667+00:00

It can only recognize about 12 seconds of audio; the rest is not identifiable.

莓草 0 Reputation points

2024-07-08T08:53:52.6+00:00

and how to return the corresponding symbol in the words parameter，there is no return symbol now

莓草 0 Reputation points

2024-07-08T08:54:35.89+00:00

The three questions above, please take a look.

navba-MSFT 24,910 Reputation points Microsoft Employee

2024-07-08T09:01:24.4366667+00:00

@莓草 The Azure Speech Service should be able to handle audio inputs longer than 12 seconds. If it’s not recognizing the full audio, it could be due to a variety of reasons such as the quality of the audio, the clarity of the speech, or network issues.

As for the timestamps, the Recognizing event is fired for intermediate results, while the Recognized event is fired when a final result is recognized. If you’re looking for timestamps for each recognized word, you might want to check the Result object returned in the Recognized event. It should contain a list of NBest transcriptions, and each transcription should have a list of Words with their respective Offset and Duration. Also you could try to enable the speech SDK logging and check the logs for any known issues:

import ("github.com/Microsoft/cognitive-services-speech-sdk-go/common") speechConfig.SetProperty(common.SpeechLogFilename, "LogfilePathAndName")

More info here.
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.

Share via

How to get sentence word timestamp results for real-time speech recognition ?

1 answer

Your answer