i have just discovered the speech to text APIs and I'm amazed by them.
I use a power automate flow to transcript the audio files.
I have noticed though that the running time needed for the transcription varies in quite a relevant way: a fresh example, an audio POSTed for transcription with length 1:14:36 (mono) was successfully transcribed in 01:19:17, whereas a file lasting 03:14:59 (mono) has been running for 11:22:05 (and counting).
The parameter provided are always the same, namely:
So my question is, where can i find information on what impact the speed of the transcription?
Thank you in advance!
Addendum: could it be an issue of diarization and possibly more than 2 speakers (usually the audios are axtracted from meetings)?