Speech recognition taking a long time using cognitive speech to text service PushAudioStream with Twilio Call

swarnavageotech 1 Reputation point
2021-04-08T06:44:44.757+00:00

I am trying to transcribe a Twilio call. To so do I am converting using Azure Cognitive Services Speech To Text.

The problem is when I am sending data to the service using PushAudioStream by writing to it. The recognized event is received after a long time, sometimes 1 minute late.

I am using audioop.ulaw2lin(chunks[0],4) to convert twilio voice chunks to the compatible format before writing to the stream.

Any help is appriciated.

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,866 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,736 Reputation points
    2021-04-08T11:48:41.443+00:00

    @swarnavageotech Thanks for the question. Can you please add more details Speech SDK version that you are using, also please share the sample code.
    PushAudioInputStream and PullAudioInputStream now send wav header information to the Speech Service based on AudioStreamFormat, optionally specified when they were created. Customers must now use the supported audio input format. Any other formats will get sub-optimal recognition results or may cause other issues.

    Please follow the Document to use codec compressed audio input with the Speech SDK.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.