Azure Pronunciation-Assessment showing InitialSilenceTimeout for audio files

Alexander Brown 0 Reputation points
2023-09-20T10:01:52.0433333+00:00

Hi all,
I have been having some issues when trying to use Azure speech assessment for uploaded audio files to my NextJS application. I know that the fetch request to Azure works, since I can record audio directly in the web browser and it functions as expected, and I've found a select few .wav files that work as intended, so I don't think its specifically an Azure issue, but I can't rule it out.

A successful file assessment looks like this:
{RecognitionStatus: 'Success', Offset: 1300000, Duration: 8900000, NBest: Array(1), DisplayText: 'The A.'}

However, the vast majority of files I upload and try to send for Azure speech assessment return an error like this:
Not successful {RecognitionStatus: 'InitialSilenceTimeout', Offset: 8500000, Duration: 0}

Since i'm working on a private code base for a company, I can't share too much about our application, but if someone has an idea why this unusual behaviour occurs, I would greatly appreciate it. Here is some of the code relating to how I convert the audio files to a blob and send to a separate function to be assessed:

let url = URL.createObjectURL(file);
      setFileUrl(url);
      fetch(url).then(async function(response) {
        if (response.ok) {
          const newBlob = new Blob([file], { type: AZURE_CODEC });
          const result = await fetchAssessment(newBlob, text || freeInput)
          if (result) {
            const output = sentenceAssessment(result.Words, phonemeList);
            setNBestPhonemes(output);
            saveAssessment(result, newBlob, wordAssessmentData, setWordAssessmentData, completionId);
          }
        }
      })

I've checked the audio files and compared the few that work with the others that don't, but can't find any obvious difference between them. If you need to know additional information about the audio files I can do my best to find whatever's necessary.

Thank you!

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,720 questions
{count} votes

1 answer

Sort by: Most helpful
  1. romungi-MSFT 45,961 Reputation points Microsoft Employee
    2023-09-21T07:53:24.26+00:00

    @Alexander Brown I do not much experience with NextJS but I would suggest the following to debug the scenario.

    1. Use the files that fail on Azure speech studio. Upload the file directly under pronunciation assessment page of the speech studio and check if the audio file works. You also need to input the reference text for the audio file to complete the assessment. If the assessment fails, then the file has issue with the audio format. If the assessment works, then something is wrong with the code while reading the file from the blob. The next steps should help to check further in this case.
    2. Check for file permissions in the blob, if the file cannot be read and the content is not streamed to the API then it might be just seeing no voice audio and failing with the timeout. Are the files that do work also from the same blob container?
    3. What is the reference text that is passed along with the file? Is it the same for every file or does it need to be updated with each file? What is also the freeInput being passed? Does the fetch request pass any text while calling the API?

    I hope this helps!! Thanks!!

    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.