Speech To Text Performance

Question

What performance could be expected from the Speech-To-Text service? I know this is highly dependent on the recourses allocated in the host enviroment, and probably the quality of the audio recording.

Is there a baseline performance claim, for example: The transcription usually happens in 2X realtime?

Accepted Answer

@Alexander G With respect to performance of speech to text there is no baseline claim for processing the audio and returning the text response since there are many factors that effect the response including the audio quality, network bandwidth, SDK or REST API used, pricing tier of the resource.

However, there are a few guidelines mentioned in the FAQ that help in the performance and in most of the cases including the transcription scenarios the response is fast. For batch transcription the jobs are scheduled on a best effort basis. You cannot estimate when a job will change into the running state, but it should happen within minutes under normal system load. Once in the running state, the transcription occurs faster than the audio runtime playback speed.

I hope this helps.

Share via

Speech To Text Performance

0 additional answers