question

MohammadAlHakim-6320 avatar image
0 Votes"
MohammadAlHakim-6320 asked ramr-msft answered

Real-time speech-to-text speed improvement

I am wondering if there’s a way for me to speed up the process of real-time transcription. Preferably for synchronous speech recognition as my usages are going to be relatively short.

I originally considered building a container so that I can have the model running locally to decrease latency however given that this is a personal project it is likely not possible to get approval from Azure.

To give you some context about my situation: I am based close to the Australian East servers, with internet speeds of >200mbps and using a high end PC.

My current speed isn't that bad but I feel it sometimes lags a bit.
If you have any advice for improving the speed I would greatly appreciate it.

azure-speech
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

ramr-msft avatar image
1 Vote"
ramr-msft answered

@MohammadAlHakim-6320 Thanks for the question. Is your audio files are of different sizes?. Can you please share the test results that took more time to transcript.

You can use the REST API for Speech to Text to transcribe larger files and You can run a custom model locally in a Docker container.

https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-services-quotas-and-limits

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.


Thanks for the reply @ramr-msft . My audio files are all less than 2 minutes long.

It seems that the lag problem is a bit hard to replicate at times but usually it happens when the speech style is spontanteous/unconventional. I guess I am just wondering how to achieve fastest speeds/most consistent results to limit lag as much as possible.

As for running the model locally, I imagine that it would probably provide me the best results but given I am not a microsoft partner I imagine it would not be possible to get access to the speech container as the form states:

"Possible causes for a denied application are as follows:
a. Not an existing Microsoft Partner or Enterprise Agreement customer - all products with an asterisk next to them are limited to customers with Enterprise Agreements or Partner status."

0 Votes 0 ·