Speech to Text

Question

Speech to Text

Pedro Rodrigues 0

Good morning,

I am exploring your Speech-To-Text with the aim of creating a service that makes calls, asks a question, analyzes the response, and records it. For each call, an instance is created to isolate the transcription. The service streams audio to Azure using audioInputStream.Write(data.Data, data.Data.Length), starts StartContinuousRecognitionAsync, and waits for translations to record the responses. It works perfectly for a single call, but the problem arises when there are multiple calls. How many calls can I transcribe in real-time at the same time? Is there a limit to creating instances of the SpeechRecognizer class? Can I only use one?

Best regards,

Pedro Rodrigues

2 answers

Your answer

Answer 1

Dillon Silzer 57,831 Volunteer Moderator

Hi Pedro,

If you are using the Free version, then you are limited to 1 concurrent request limit (call). If you are using the Standard plan, you can make up to 100 concurrent calls:

Online transcription and speech translation

https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-services-quotas-and-limits#online-transcription-and-speech-translation

If this is helpful please accept answer.

Pedro Rodrigues 0 Reputation points

2023-03-30T16:31:51.17+00:00

The problem that seems to be happening is that even though I am sending the audio stream of the calls separately, when I have more than one call and therefore make more than one stream, I stop receiving responses as soon as I hang up the calls, and only when I have a single call again everything starts working. My current hypothesis, which seems more credible to me, is that when I stream the audio, even though I am sending it in two different instances, for the Azure service it is like it's just one stream, and I stop receiving responses because the audio no longer makes sense.

Answer 2

Pedro Rodrigues 0

The problem that seems to be happening is that even though I am sending the audio stream of the calls separately, when I have more than one call and therefore make more than one stream, I stop receiving responses as soon as I hang up the calls, and only when I have a single call again everything starts working. My current hypothesis, which seems more credible to me, is that when I stream the audio, even though I am sending it in two different instances, for the Azure service it is like it's just one stream, and I stop receiving responses because the audio no longer makes sense.

romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator

2023-03-31T08:31:40.3433333+00:00

If separate instances of speech recognizer are used and audio is streamed separately for both instances, then at the service or resource perspective the resource will scale up compute to accommodate the increase in the requests rate. This is because the limits are evaluated at connection time when a recognition request begins. During this period you might see 429 response codes, but it is transient in nature as the service scales up to increased capacity. Also, at the resource level the audio is not a single stream for two different requests.

You might also want to use the CreatePushStream() or CreatePullStream() method to write data instead of using the write() method.

To understand what could be wrong with your client setup you might want to enable logging to check if the requests are running with different ids or if the audioStream is failing with different instances of recognizer.
romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator

2023-04-04T05:29:36.4166667+00:00

Pedro Rodrigues Did you get a chance to check if adding logging helped to check the reason for errors when more than one request is passed?

Share via

Speech to Text

2 answers

Your answer