High egress/bandwidth usage with Batch Transcription contentUrls - Does the service download files multiple times?
Hi everyone,
I’m working with the Azure AI Speech Batch Transcription API and running into an issue with unexpected bandwidth consumption.
Setup:
We are submitting transcription jobs using the contentUrls property, pointing to audio files hosted on our own external storage (non-Azure).
Problem:
We are noticing a significant spike in egress traffic from our storage that doesn't add up. The total bandwidth consumed is notably higher than the total size of the audio files we are submitting.
We don't have granular access logs on the storage side to pinpoint exact request counts, but the traffic volume suggests the Speech service is accessing or downloading the same file multiple times per transaction.
My Questions:
- Is this expected behaviour? Does the Batch Transcription service perform multiple passes (e.g., specific
HEADprobes for metadata followed by theGET, or separate downloads for different processing stages)? - Retry Logic: If the service encounters a transient network issue, does it restart the download from scratch?
- Documentation: I’ve looked through the Batch Transcription docs but can't find any info regarding "single-access" guarantees or retry behaviour for
contentUrls.
Has anyone else experienced this "traffic multiplier" when hosting files externally?
Thanks!