Speech To Text Performance

Alexander G 21 Reputation points
2021-09-09T06:01:31.387+00:00

What performance could be expected from the Speech-To-Text service? I know this is highly dependent on the recourses allocated in the host enviroment, and probably the quality of the audio recording.

Is there a baseline performance claim, for example: The transcription usually happens in 2X realtime?

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,645 questions
0 comments No comments
{count} votes

Accepted answer
  1. romungi-MSFT 43,696 Reputation points Microsoft Employee
    2021-09-09T10:55:37.497+00:00

    @Alexander G With respect to performance of speech to text there is no baseline claim for processing the audio and returning the text response since there are many factors that effect the response including the audio quality, network bandwidth, SDK or REST API used, pricing tier of the resource.

    However, there are a few guidelines mentioned in the FAQ that help in the performance and in most of the cases including the transcription scenarios the response is fast. For batch transcription the jobs are scheduled on a best effort basis. You cannot estimate when a job will change into the running state, but it should happen within minutes under normal system load. Once in the running state, the transcription occurs faster than the audio runtime playback speed.

    I hope this helps.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful