Real-time speech-to-text speed improvement

Question

Real-time speech-to-text speed improvement

Mohammad Al-Hakim 1

I am wondering if there’s a way for me to speed up the process of real-time transcription. Preferably for synchronous speech recognition as my usages are going to be relatively short.

I originally considered building a container so that I can have the model running locally to decrease latency however given that this is a personal project it is likely not possible to get approval from Azure.

To give you some context about my situation: I am based close to the Australian East servers, with internet speeds of >200mbps and using a high end PC.

My current speed isn't that bad but I feel it sometimes lags a bit.
If you have any advice for improving the speed I would greatly appreciate it.

1 answer

Your answer

Answer 1

Ramr-msft 17,826

@Mohammad Al-Hakim Thanks for the question. Is your audio files are of different sizes?. Can you please share the test results that took more time to transcript.

You can use the REST API for Speech to Text to transcribe larger files and You can run a custom model locally in a Docker container.

https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-services-quotas-and-limits

Mohammad Al-Hakim 1 Reputation point

2022-04-19T13:14:15.793+00:00

Thanks for the reply @Ramr-msft . My audio files are all less than 2 minutes long.

It seems that the lag problem is a bit hard to replicate at times but usually it happens when the speech style is spontanteous/unconventional. I guess I am just wondering how to achieve fastest speeds/most consistent results to limit lag as much as possible.

As for running the model locally, I imagine that it would probably provide me the best results but given I am not a microsoft partner I imagine it would not be possible to get access to the speech container as the form states:

"Possible causes for a denied application are as follows:
a. Not an existing Microsoft Partner or Enterprise Agreement customer - all products with an asterisk next to them are limited to customers with Enterprise Agreements or Partner status."

Share via

Real-time speech-to-text speed improvement

1 answer

Your answer