Hello π
I've trained a custom STT model using Azure Speech Services. I'm currently testing it with REST API requests, as in the How to Recognize Speech (REST) docs. What I've noticed is that as the number of seconds the audio has increases, so does the transcription time. That makes sense, but for audios of 7s is taking around 1.6s, whereas for 15s audios is taking 3.2s (it doubles!) Is this the typical behavior? I've tested Whisper also, and the time it takes is way far less... around 1s for 15s audios.
How can I decrease the response time? Is it even possible to decrease it? I've changed the region where the deployment was made (it was West Europe and I deployed it in Norway), but the times are identical.
Below is a table where I've compared the Azure standard model and a fine-tuned version of that model. They are both in West Europe and North Europe regions and they are used to transcribe audios of 1 to 15 seconds. For each audio duration (1, 2, ... 15) there I've used 40 audio samples and the times in the table are the average across each sample.
| duration_int | azure_standard_we_time | azure_standard_norway_time | azure_ft_we_time | azure_ft_norway_time |
|---------------:|-------------------------:|-----------------------------:|-------------------:|-----------------------:|
| 0 | 0.403277 | 0.38325 | 0.419045 | 0.373875 |
| 1 | 0.58759 | 0.576389 | 0.65832 | 0.721355 |
| 2 | 0.84353 | 0.766682 | 0.799341 | 0.949132 |
| 3 | 1.0572 | 0.928268 | 1.04376 | 0.93834 |
| 4 | 1.04866 | 1.03904 | 1.06173 | 1.08996 |
| 5 | 1.2742 | 1.21688 | 1.31663 | 1.25482 |
| 6 | 1.39373 | 1.44435 | 1.43455 | 1.49134 |
| 7 | 1.55196 | 1.57978 | 1.63983 | 1.59788 |
| 8 | 1.7812 | 1.80114 | 1.8757 | 1.83565 |
| 9 | 1.92388 | 1.94428 | 2.00655 | 1.94672 |
| 10 | 2.17438 | 2.10336 | 2.23532 | 2.12242 |
| 11 | 2.43509 | 2.42098 | 2.50319 | 2.35365 |
| 12 | 2.67719 | 2.52439 | 2.56752 | 2.53058 |
| 13 | 2.90506 | 2.804 | 2.79939 | 2.75962 |
| 14 | 2.90891 | 2.93892 | 2.97066 | 3.003 |
| 15 | 3.18507 | 3.18829 | 3.32962 | 3.23664 |
Thanks in advance!
Bruno