Excessive time on Custom STT transcriptions

Bruno Goncalves Vaz (P) 20 Reputation points
2024-07-15T13:27:48.99+00:00

Hello πŸ™‚

I've trained a custom STT model using Azure Speech Services. I'm currently testing it with REST API requests, as in the How to Recognize Speech (REST) docs. What I've noticed is that as the number of seconds the audio has increases, so does the transcription time. That makes sense, but for audios of 7s is taking around 1.6s, whereas for 15s audios is taking 3.2s (it doubles!) Is this the typical behavior? I've tested Whisper also, and the time it takes is way far less... around 1s for 15s audios.

How can I decrease the response time? Is it even possible to decrease it? I've changed the region where the deployment was made (it was West Europe and I deployed it in Norway), but the times are identical.

Below is a table where I've compared the Azure standard model and a fine-tuned version of that model. They are both in West Europe and North Europe regions and they are used to transcribe audios of 1 to 15 seconds. For each audio duration (1, 2, ... 15) there I've used 40 audio samples and the times in the table are the average across each sample.

|   duration_int |   azure_standard_we_time |   azure_standard_norway_time |   azure_ft_we_time |   azure_ft_norway_time |
|---------------:|-------------------------:|-----------------------------:|-------------------:|-----------------------:|
|              0 |                 0.403277 |                     0.38325  |           0.419045 |               0.373875 |
|              1 |                 0.58759  |                     0.576389 |           0.65832  |               0.721355 |
|              2 |                 0.84353  |                     0.766682 |           0.799341 |               0.949132 |
|              3 |                 1.0572   |                     0.928268 |           1.04376  |               0.93834  |
|              4 |                 1.04866  |                     1.03904  |           1.06173  |               1.08996  |
|              5 |                 1.2742   |                     1.21688  |           1.31663  |               1.25482  |
|              6 |                 1.39373  |                     1.44435  |           1.43455  |               1.49134  |
|              7 |                 1.55196  |                     1.57978  |           1.63983  |               1.59788  |
|              8 |                 1.7812   |                     1.80114  |           1.8757   |               1.83565  |
|              9 |                 1.92388  |                     1.94428  |           2.00655  |               1.94672  |
|             10 |                 2.17438  |                     2.10336  |           2.23532  |               2.12242  |
|             11 |                 2.43509  |                     2.42098  |           2.50319  |               2.35365  |
|             12 |                 2.67719  |                     2.52439  |           2.56752  |               2.53058  |
|             13 |                 2.90506  |                     2.804    |           2.79939  |               2.75962  |
|             14 |                 2.90891  |                     2.93892  |           2.97066  |               3.003    |
|             15 |                 3.18507  |                     3.18829  |           3.32962  |               3.23664  |

Thanks in advance!

Bruno

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,629 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Azar 22,845 Reputation points MVP
    2024-07-15T17:33:22.1466667+00:00

    Hi there Bruno Goncalves Vaz (P)

    Thanks for using QandA platform

    I guess itss normal for transcription times to increase with longer audio, but your Azure custom STT model seems slower compared to Whisper. To reduce response time, try with your audio files are pre-processed correctly and also consider splitting longer files into smaller chunks for parallel processing. Check that you're using an appropriate service tier which i belive you are and that your model has sufficient resources. While you've tested different regions, also consider network latency and choose the fastest one. try Optimizing your custom model and API calls, and consider using the async API for longer files.

    If nothing worked i suggest you to raise an support request in azure.

    https://portal.azure.com/?quickstart=true#view/Microsoft_Azure_Support/HelpAndSupportBlade/~/overview

    If this helps kindly accept the answer thanks much.