Having problem with InitialSilenceTimeout for SpeechAssessment API

Question

Having problem with InitialSilenceTimeout for SpeechAssessment API

VINICIUS FELIZATTI JACINTO 0

I'm using the SpeechAssessment API for an English Education App to rate students speech. Now I'm trying the API and been having trouble with the audios generated by the mobile App. It is like the API isn't recognizing the audio and setting the following return for it:

{
    "RecognitionStatus": "InitialSilenceTimeout",
    "Offset": 5100000,
    "Duration": 0
}

Here is my code:

import requests
import base64
import json
import time

subscriptionKey = "xxxxxxxxx" # replace this with your subscription key
region = "eastus" # replace this with the region corresponding to your subscription key, e.g. westus, eastasia
speech_config = speechsdk.SpeechConfig(subscription=subscriptionKey, region=region)
# Set conversation ending detection timeout (4 hours in seconds)
speech_config.set_service_property("InitialSilenceTimeoutMs", "50000", speechsdk.ServicePropertyChannel.UriQueryParameter)

# a generator which reads audio data chunk by chunk
# the audio_source can be any audio input stream which provides read() method, e.g. audio file, microphone, memory stream, etc.
def get_chunk(audio_source, chunk_size=1024):
    while True:
        #time.sleep(chunk_size / 32000) # to simulate human speaking rate
        chunk = audio_source.read(chunk_size)
        if not chunk:
            #global uploadFinishTime
            #uploadFinishTime = time.time()
            break
        yield chunk
# build pronunciation assessment parameters
referenceText = 'In a dimly lit forest, a solitary child stands at the edge of a misty clearing, her small figure illuminated by a shaft of golden sunlight piercing through the dense canopy above.'
pronAssessmentParamsJson = "{\"ReferenceText\":\"%s\",\"GradingSystem\":\"HundredMark\",\"Dimension\":\"Comprehensive\"}" % referenceText
pronAssessmentParamsBase64 = base64.b64encode(bytes(pronAssessmentParamsJson, 'utf-8'))
pronAssessmentParams = str(pronAssessmentParamsBase64, "utf-8")
# build request
url = "https://%s.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-us" % region
headers = { 'Accept': 'application/json;text/xml',
            'Connection': 'Keep-Alive',
            'Content-Type': 'audio/mp3; codecs=audio/pcm; samplerate=16000',
            'Ocp-Apim-Subscription-Key': subscriptionKey,
            'Pronunciation-Assessment': pronAssessmentParams,
            'Transfer-Encoding': 'chunked',
            'Expect': '100-continue'
          }
audioFile = open('file.mp3','rb')
    
# send request with chunked data
response = requests.post(url=url, data=get_chunk(audioFile), headers=headers)
getResponseTime = time.time()
audioFile.close()
resultJson = json.loads(response.text)
print(json.dumps(resultJson, indent=4))
latency = getResponseTime - uploadFinishTime
print("Latency = %sms" % int(latency * 1000))

Can you help me solve this problem? Is there a parameter I can pass to increase the timeout for the API os maybe handle the audio another way?

1 answer

Your answer

Answer 1

VINICIUS FELIZATTI JACINTO 0

Here's the path to download the 'file.mp3' https://we.tl/t-kdLEydW3xn

Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.

Pavankumar Purilla 8,335 Microsoft External Staff Moderator

Hi VINICIUS FELIZATTI JACINTO,
Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!
I understand you’re facing issues with InitialSilenceTimeout.

To address this, use an audio-processing library like pydub or librosa to detect and remove silence from the beginning of the audio file before sending it to the API. Below is a complete Python code example that:

1.Trimming Silence from the Audio File:

This approach uses pydub to trim silence from the beginning of the audio file before sending it to the Azure Speech API.

import requests
import base64
import json
import azure.cognitiveservices.speech as speechsdk
from pydub import AudioSegment
# Replace with your Azure Speech subscription details
subscriptionKey = "your_subscription_key"
region = "your_region"
# Load the audio file and trim silence from the beginning (optional)
audio = AudioSegment.from_file('file.mp3')
trimmed_audio = audio.strip_silence(silence_thresh=-50)  # Adjust silence_thresh based on noise level
trimmed_audio.export('trimmed_file.wav', format='wav')  # Export trimmed audio to a new file
# Define speech config and set the InitialSilenceTimeoutMs property
speech_config = speechsdk.SpeechConfig(subscription=subscriptionKey, region=region)
speech_config.set_property(speechsdk.PropertyId.SpeechServiceConnection_InitialSilenceTimeoutMs, "5000")  # Set to 5 seconds
# Generator function to yield audio chunks
def get_chunk(audio_source, chunk_size=2048):
    while True:
        chunk = audio_source.read(chunk_size)
        if not chunk:
            break
        yield chunk
# Prepare pronunciation assessment parameters
referenceText = 'In a...'
pronAssessmentParamsJson = "{\"ReferenceText\":\"%s\",\"GradingSystem\":\"HundredMark\",\"Dimension\":\"Comprehensive\"}" % referenceText
pronAssessmentParamsBase64 = base64.b64encode(bytes(pronAssessmentParamsJson, 'utf-8')).decode("utf-8")
# Build the request URL and headers
url = f"https://{region}.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-us"
headers = {
    'Accept': 'application/json;text/xml',
    'Connection': 'Keep-Alive',
    'Content-Type': 'audio/wav; codecs=audio/pcm; samplerate=16000',  # Set to 16kHz PCM WAV
    'Ocp-Apim-Subscription-Key': subscriptionKey,
    'Pronunciation-Assessment': pronAssessmentParamsBase64,
    'Transfer-Encoding': 'chunked',
}
# Open the trimmed audio file for reading and send the request
with open('trimmed_file.wav', 'rb') as audioFile:  
    response = requests.post(url=url, data=get_chunk(audioFile), headers=headers)
# Process and print the result
resultJson = json.loads(response.text)
print(resultJson)

2.Using Initial Silence Timeout Without Trimming:
This approach sets the InitialSilenceTimeout parameter directly in the API call, which allows you to bypass the need for trimming the audio file beforehand.

import requests
import base64
import json
import azure.cognitiveservices.speech as speechsdk
from pydub import AudioSegment
# Replace with your Azure Speech subscription details
subscriptionKey = "your_subscription_key"
region = "your_region"
audio_filename = "file.wav"  # Replace with your audio file path
def speech_recognize_once_from_file_with_custom_endpoint_parameters():
    """Performs one-shot speech recognition with input from an audio file, specifying an endpoint with custom parameters."""
    
    initial_silence_timeout_ms = 15 * 1e3  # Set initial silence timeout to 15 seconds
    template = "wss://{}.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?initialSilenceTimeoutMs={:d}"
    
    speech_config = speechsdk.SpeechConfig(subscription=subscriptionKey,
                                           endpoint=template.format(region, int(initial_silence_timeout_ms)))
    
    print("Using endpoint", speech_config.get_property(speechsdk.PropertyId.SpeechServiceConnection_Endpoint))
    
    audio_config = speechsdk.audio.AudioConfig(filename=audio_filename)
    
    # Creates a speech recognizer using a file as audio input
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
    # Start speech recognition and return after a single utterance is recognized
    result = speech_recognizer.recognize_once()
    # Check the result
    if result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print("Recognized: {}".format(result.text))
    elif result.reason == speechsdk.ResultReason.NoMatch:
        print("No speech could be recognized: {}".format(result.no_match_details))
    elif result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = result.cancellation_details
        print("Speech Recognition canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            print("Error details: {}".format(cancellation_details.error_details))
# Prepare pronunciation assessment parameters
referenceText = 'In a d...'
pronAssessmentParamsJson = "{\"ReferenceText\":\"%s\",\"GradingSystem\":\"HundredMark\",\"Dimension\":\"Comprehensive\"}" % referenceText
pronAssessmentParamsBase64 = base64.b64encode(bytes(pronAssessmentParamsJson, 'utf-8')).decode("utf-8")
# Build the request URL and headers
url = f"https://{region}.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-us"
headers = {
    'Accept': 'application/json;text/xml',
    'Connection': 'Keep-Alive',
    'Content-Type': 'audio/wav; codecs=audio/pcm; samplerate=16000',  # Ensure the audio file is in the correct format
    'Ocp-Apim-Subscription-Key': subscriptionKey,
    'Pronunciation-Assessment': pronAssessmentParamsBase64,
    'Transfer-Encoding': 'chunked',
}
# Open the audio file for reading and send the request
with open(audio_filename, 'rb') as audioFile:  
    response = requests.post(url=url, data=audioFile, headers=headers)
# Process and print the result
resultJson = json.loads(response.text)
print(resultJson)
# Call the speech recognition function
speech_recognize_once_from_file_with_custom_endpoint_parameters()

For more info, please go through this this: initial_silence_timeout_ms

I hope this information helps! Feel free to reach out if you have any further questions. Thank you!

Pavankumar Purilla 8,335 Reputation points Microsoft External Staff Moderator

2024-10-28T02:41:45.9733333+00:00

Hi VINICIUS FELIZATTI JACINTO,
Greetings of the day!
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution, please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
Pavankumar Purilla 8,335 Reputation points Microsoft External Staff Moderator

2024-10-29T03:12:45.9766667+00:00

Hi VINICIUS FELIZATTI JACINTO,
Greetings of the day!
Just checking back to see if you have a resolution yet. In case if you have any resolution, please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
VINICIUS FELIZATTI JACINTO 0 Reputation points

2024-10-29T11:44:54.43+00:00

Sorry for the delay. Will test your solution today and tell if it works.
Pavankumar Purilla 8,335 Reputation points Microsoft External Staff Moderator

2024-10-29T14:15:12.7666667+00:00

Hi VINICIUS FELIZATTI JACINTO,
Please let me know how it works. Thank you.

Share via

Having problem with InitialSilenceTimeout for SpeechAssessment API

1 answer

Your answer