InitialSilenceTimeout when using Rest API

Rhenan Bartels 21 Reputation points
2022-07-27T17:44:47.897+00:00

I am using the Cognitive service REST API to send chunks of audio (approximately 20s). However, some chunks have a long initial period without a human voice - just silence or the vignette sound).

In these cases, we are receiving InitialSilenceTimeout and no transcriptions in the API response.

Is there a way to tell the REST API to wait longer for the speech to start?

This is the endpoint we are using to perform speech-to-text:

`https://eastus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=pt-BR`  

Thanks in advance,

Rhenan

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,061 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,602 questions
0 comments No comments
{count} votes

Accepted answer
  1. romungi-MSFT 48,906 Reputation points Microsoft Employee Moderator
    2022-07-28T05:40:42.017+00:00

    @Rhenan Bartels I believe you might be seeing a response like below if there is a long silence in your audio file.

    {  
        "RecognitionStatus": "InitialSilenceTimeout",  
        "Offset": 50000000,  
        "Duration": 0  
    }  
    

    This is because the short audio API does not allow setting the initial silence parameter using REST API. The only ones supported are mentioned here.
    You should try to switch to using any of the SDK and set the InitialSilenceTimeout parameter of the speech recognition engine.

    If an answer is helpful, please click on 130616-image.png or upvote 130671-image.png which might help other community members reading this thread.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.