Issue with speech-to-text service

Vidyadhar Busam 0 Reputation points
2024-05-16T06:18:15.6466667+00:00

While converting the given wave file from Speech-to-Text using Microsoft's Speech-to-Text service, it is not detecting "No" at 57th second in this file but detecting at 1:12 min and in other places.

Speech recognised is as follow

RECOGNIZED: {"Confidence":0.86026245,"Lexical":"yeah i need an appointment","ITN":"yeah i need an appointment","MaskedITN":"yeah i need an appointment","Display":"Yeah, I need an appointment.","Words":[{"Word":"yeah","Offset":121200000,"Duration":3600000},{"Word":"i","Offset":124800000,"Duration":400000},{"Word":"need","Offset":125200000,"Duration":2000000},{"Word":"an","Offset":127200000,"Duration":1200000},{"Word":"appointment","Offset":128400000,"Duration":6000000}]}

RECOGNIZED: {"Confidence":0.702717,"Lexical":"yes","ITN":"yes","MaskedITN":"yes","Display":"Yes.","Words":[{"Word":"yes","Offset":228900000,"Duration":5600000}]}

RECOGNIZED: {"Confidence":0.4998704,"Lexical":"morning","ITN":"morning","MaskedITN":"morning","Display":"Morning.","Words":[{"Word":"morning","Offset":355500000,"Duration":7200000}]}

2024-05-16T05:59:30.827Z [debug] microsoft-stt :: No speech could be recognized
2024-05-16T05:59:30.829Z [debug] microsoft-stt :: No speech could be recognized

RECOGNIZED: {"Confidence":0.797473,"Lexical":"no","ITN":"no","MaskedITN":"no","Display":"No.","Words":[{"Word":"no","Offset":707600000,"Duration":4000000}]}

RECOGNIZED: {"Confidence":0.76081145,"Lexical":"yes","ITN":"yes","MaskedITN":"yes","Display":"Yes.","Words":[{"Word":"yes","Offset":812000000,"Duration":6400000}]}

RECOGNIZED: {"Confidence":0.54089016,"Lexical":"yes","ITN":"yes","MaskedITN":"yes","Display":"Yes.","Words":[{"Word":"yes","Offset":944700000,"Duration":6800000}]}

RECOGNIZED: {"Confidence":0.38486534,"Lexical":"no","ITN":"no","MaskedITN":"no","Display":"No.","Words":[{"Word":"no","Offset":1121500000,"Duration":5600000}]}
2024-05-16T06:00:02.350Z [debug] microsoft-stt :: CANCELED: Reason=1

Input wave file:

https://meeamitech-my.sharepoint.com/:u:/p/vidyadhar_busam/EdmkpIY-zDlCuZFhzQRq0qYBr7PmG73wJaT0hQYW4hZdxg?nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJPbmVEcml2ZUZvckJ1c2luZXNzIiwicmVmZXJyYWxBcHBQbGF0Zm9ybSI6IldlYiIsInJlZmVycmFsTW9kZSI6InZpZXciLCJyZWZlcnJhbFZpZXciOiJNeUZpbGVzTGlua0NvcHkifX0&e=Div3Z2

Please fix this issue.
Thanks.

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,453 questions
{count} votes

1 answer

Sort by: Most helpful
  1. santoshkc 5,335 Reputation points Microsoft Vendor
    2024-05-16T08:17:01.7233333+00:00

    Hi @Vidyadhar Busam,

    Thank you for reaching out to Microsoft Q&A forum!

    Based on the sample you provided, it appears that the Speech-to-Text service is not detecting the word 'No' at the 57th second in the given WAV file. This could be due to the initial and intermittent silence timeouts used by the service to detect the start and end of speech segments.

    We suggest checking if the issue is reproducible with other WAV files and ensuring that the WAV file meets the requirements of the Speech-to-Text service. Also, we recommend using a high-quality audio file with clear speech to ensure accurate transcription.

    If the issue persists, we recommend raising a support request with Microsoft Azure support for further assistance. Thank you for bringing this to our attention.

    I hope you understand. And, if you have any further query do let us know.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful.