Share via

High egress/bandwidth usage with Batch Transcription contentUrls - Does the service download files multiple times?

Antonio Feregrino 0 Reputation points
2025-12-04T18:02:22.35+00:00

Hi everyone,

I’m working with the Azure AI Speech Batch Transcription API and running into an issue with unexpected bandwidth consumption.

Setup:

We are submitting transcription jobs using the contentUrls property, pointing to audio files hosted on our own external storage (non-Azure).

Problem:

We are noticing a significant spike in egress traffic from our storage that doesn't add up. The total bandwidth consumed is notably higher than the total size of the audio files we are submitting.

We don't have granular access logs on the storage side to pinpoint exact request counts, but the traffic volume suggests the Speech service is accessing or downloading the same file multiple times per transaction.

My Questions:

  1. Is this expected behaviour? Does the Batch Transcription service perform multiple passes (e.g., specific HEAD probes for metadata followed by the GET, or separate downloads for different processing stages)?
  2. Retry Logic: If the service encounters a transient network issue, does it restart the download from scratch?
  3. Documentation: I’ve looked through the Batch Transcription docs but can't find any info regarding "single-access" guarantees or retry behaviour for contentUrls.

Has anyone else experienced this "traffic multiplier" when hosting files externally?

Thanks!

Azure Speech in Foundry Tools

3 answers

Sort by: Most helpful
  1. Michael Dereje 0 Reputation points
    2026-01-20T13:59:20.86+00:00

    Hi @Manas Mohanty since we cannot enforce or control how Azure/OpenAI Whisper downloads files from your presigned URLs. I'm wondering about your Internal Implementation: Are you able to tell me whether Azure use Range headers, parallel chunk downloads, or single full-file downloads is an implementation detail on its side that is not exposed to the user?

    Was this answer helpful?

    0 comments No comments

  2. Michael Dereje 0 Reputation points
    2025-12-18T13:45:53.8866667+00:00

    Hi Manas Mohanty,

    Thank you for the speedy response.

    Based on AWS S3 access logs, we are seeing the following pattern for each transcription job:

    1. Multiple Failed HEAD Requests
    • HTTP Method: HEAD
    • Response: 403 SignatureDoesNotMatch
    • Source IPs: 51.143.212.173, 51.104.27.67, 51.143.212.172 (Microsoft Azure)
    • User Agent: azsdk-net-Storage.Blobs/12.14.1 (.NET 8.0.18; CBL-Mariner/Linux)

    Sample log entry:

    REST.HEAD.OBJECT recordings/RE27beca6698f5bd1d202342bcd8aeb3e0.wav
    HEAD /recordings/RE27beca6698f5bd1d202342bcd8aeb3e0.wav?response-content-type=...
    403 SignatureDoesNotMatch
    User-Agent: "azsdk-net-Storage.Blobs/12.14.1 (.NET 8.0.18; CBL-Mariner/Linux)"
    
    1. Multiple Successful GET Requests
    • HTTP Method: GET
    • Response: 200 OK
    • Source IPs: Same Microsoft Azure IPs
    • Bytes Transferred: Full file size (e.g., 2,026,284 bytes) per request

    Sample log entry:

    REST.GET.OBJECT recordings/RE27beca6698f5bd1d202342bcd8aeb3e0.wav
    GET /recordings/RE27beca6698f5bd1d202342bcd8aeb3e0.wav?response-content-type=...
    200 - 2026284 bytes
    

    Questions and Concerns

    1. Why is the service making HEAD requests? The pre-signed URLs we generate are specifically signed for GET requests only. AWS S3 pre-signed URLs are method-specific, which means a URL signed for GET will not authenticate HEAD requests (resulting in the SignatureDoesNotMatch errors we're seeing).

    Question: Why does the Azure Batch Transcription Service attempt HEAD requests? Is this for metadata retrieval, connection testing, or another purpose?

    1. Why are there ~500 requests per file? For a single audio file transcription, we're seeing hundreds of requests being made.

    Question: What is causing this volume of requests? Is the service:

    • Retrying failed requests excessively?
    • Downloading the file multiple times?
    • Using a chunked download strategy with poor efficiency?
    • Experiencing internal errors that trigger re-downloads?
    1. How should we configure pre-signed URLs for Azure compatibility?

    Questions:

    • Does the Azure service require both HEAD and GET access?
    • Is there documentation on the expected HTTP methods the service will use?
    • Is there a recommended approach for pre-signed URL configuration with your service?

    Was this answer helpful?

    0 comments No comments

  3. Manas Mohanty 17,180 Reputation points Microsoft External Staff Moderator
    2025-12-09T05:01:01.33+00:00

    Hi Antonio Feregrino

    Slowness normally was noticed while communicating from Azure to Non-Azure Services in few of cases.

    Here is follow up on your queries

    Is this expected behaviour? Does the Batch Transcription service perform multiple passes (e.g., specific HEAD probes for metadata followed by the GET, or separate downloads for different processing stages)?

    Yes, it can. The Batch Transcription service may perform multiple accesses to the audio files:

    • HEAD requests to check metadata before downloading.
    • GET requests for the actual download.

      Retry Logic: If the service encounters a transient network issue, does it restart the download from scratch?

    1. If the service encounters a transient network issue, it restarts the download from scratch rather than resuming from the previous point.
    2. This means the same file could be downloaded multiple times during retries, increasing bandwidth usage. [learn.microsoft.com]

    Documentation: I’ve looked through the Batch Transcription docs but can't find any info regarding "single-access" guarantees or retry behaviour for contentUrls.

    1. Current official documentation does not guarantee single-access behavior or detail retry logic for contentUrls.
    2. It only specifies that files must be publicly accessible or provided via SAS URI and that jobs are processed asynchronously

    Reference - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/batch-transcription-audio-data?tabs=portal#trusted-azure-services-security-mechanism

    Recommendation for reducing Egress

    1. Use Azure Blob Storage with SAS URIs – This minimizes latency and retries compared to external storage.
    2. Enable logging on your storage – To confirm request counts and patterns.
    3. Consider smaller batches – Large batches increase concurrency and potential retries.
    4. Monitor network stability – Reduce transient errors that trigger full re-downloads.
    5. Enable retry logic
    6. Whitelist External Storage URL if the resources are secured against virtual network

    Hope it helps.

    Thank you

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.