Here's the path to download the 'file.mp3' https://we.tl/t-kdLEydW3xn
Having problem with InitialSilenceTimeout for SpeechAssessment API
VINICIUS FELIZATTI JACINTO
0
Reputation points
I'm using the SpeechAssessment API for an English Education App to rate students speech. Now I'm trying the API and been having trouble with the audios generated by the mobile App. It is like the API isn't recognizing the audio and setting the following return for it:
{
"RecognitionStatus": "InitialSilenceTimeout",
"Offset": 5100000,
"Duration": 0
}
Here is my code:
import requests
import base64
import json
import time
subscriptionKey = "xxxxxxxxx" # replace this with your subscription key
region = "eastus" # replace this with the region corresponding to your subscription key, e.g. westus, eastasia
speech_config = speechsdk.SpeechConfig(subscription=subscriptionKey, region=region)
# Set conversation ending detection timeout (4 hours in seconds)
speech_config.set_service_property("InitialSilenceTimeoutMs", "50000", speechsdk.ServicePropertyChannel.UriQueryParameter)
# a generator which reads audio data chunk by chunk
# the audio_source can be any audio input stream which provides read() method, e.g. audio file, microphone, memory stream, etc.
def get_chunk(audio_source, chunk_size=1024):
while True:
#time.sleep(chunk_size / 32000) # to simulate human speaking rate
chunk = audio_source.read(chunk_size)
if not chunk:
#global uploadFinishTime
#uploadFinishTime = time.time()
break
yield chunk
# build pronunciation assessment parameters
referenceText = 'In a dimly lit forest, a solitary child stands at the edge of a misty clearing, her small figure illuminated by a shaft of golden sunlight piercing through the dense canopy above.'
pronAssessmentParamsJson = "{\"ReferenceText\":\"%s\",\"GradingSystem\":\"HundredMark\",\"Dimension\":\"Comprehensive\"}" % referenceText
pronAssessmentParamsBase64 = base64.b64encode(bytes(pronAssessmentParamsJson, 'utf-8'))
pronAssessmentParams = str(pronAssessmentParamsBase64, "utf-8")
# build request
url = "https://%s.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-us" % region
headers = { 'Accept': 'application/json;text/xml',
'Connection': 'Keep-Alive',
'Content-Type': 'audio/mp3; codecs=audio/pcm; samplerate=16000',
'Ocp-Apim-Subscription-Key': subscriptionKey,
'Pronunciation-Assessment': pronAssessmentParams,
'Transfer-Encoding': 'chunked',
'Expect': '100-continue'
}
audioFile = open('file.mp3','rb')
# send request with chunked data
response = requests.post(url=url, data=get_chunk(audioFile), headers=headers)
getResponseTime = time.time()
audioFile.close()
resultJson = json.loads(response.text)
print(json.dumps(resultJson, indent=4))
latency = getResponseTime - uploadFinishTime
print("Latency = %sms" % int(latency * 1000))
Can you help me solve this problem? Is there a parameter I can pass to increase the timeout for the API os maybe handle the audio another way?
Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,061 questions