Hello !
Thank you for posting on Microsoft Learn.
You're not alone in noticing that Azure Speech to Text (Fast Transcription) performance can vary even for the same file.
The performance can vary by region, especially during peak usage hours. In this case, Azure allocates resources differently across regions a busier region may have queue delays.
If you're using F0 (free) tier or low-priority quota, you may get deprioritized in resource allocation. So when usage exceeds available compute, jobs may wait longer to be scheduled.
If you're using Fast Transcription with multilingual or custom models, note that it's still in preview and preview features often run on non-scaled or experimental infrastructure, making performance more variable.