Speech recognition taking a long time using cognitive speech to text service PushAudioStream with Twilio Call

Question

I am trying to transcribe a Twilio call. To so do I am converting using Azure Cognitive Services Speech To Text.

The problem is when I am sending data to the service using PushAudioStream by writing to it. The recognized event is received after a long time, sometimes 1 minute late.

I am using audioop.ulaw2lin(chunks[0],4) to convert twilio voice chunks to the compatible format before writing to the stream.

Any help is appriciated.

Answer

@swarnavageotech Thanks for the question. Can you please add more details Speech SDK version that you are using, also please share the sample code.
PushAudioInputStream and PullAudioInputStream now send wav header information to the Speech Service based on AudioStreamFormat, optionally specified when they were created. Customers must now use the supported audio input format. Any other formats will get sub-optimal recognition results or may cause other issues.

Please follow the Document to use codec compressed audio input with the Speech SDK.

Share via

Speech recognition taking a long time using cognitive speech to text service PushAudioStream with Twilio Call

1 answer

Your answer