Share via

Azure OpenAI Realtime client_secrets returns 500 when input_audio_transcription is included (Sweden Central)

James Morgan 0 Reputation points
2026-06-02T11:21:18.4166667+00:00

We are seeing a consistent server-side failure in Azure OpenAI Realtime when requesting client secrets with input_audio_transcription enabled.

Environment

  • Region: Sweden Central
  • Resource: LBBD-OpenAI-Sweden-Dev
  • Subscription: 52a9fd6d-324a-4cc7-861d-b17e4cf9c219
  • API path: /openai/v1/realtime/client_secrets
  • Auth: Managed Identity (DefaultAzureCredential)
  • Deployment tested: gpt-4o-mini-transcribe-sweden-dev-v2 (fresh deployment name)

Observed behavior

  1. Request WITH input_audio_transcription in session payload -> HTTP 500
  2. Same request WITHOUT input_audio_transcription -> HTTP 200

This is reproducible both directly against the endpoint and through our app route that mints realtime tokens.

What we already checked

  • payload structure
  • deployment recreation with new name
  • same auth and api-version across both requests
  • retries and fallback path

Question

Is there a known region-specific issue or feature-gating requirement for input_audio_transcription in Realtime session creation on Azure OpenAI? If not, what exact prerequisites are required for this field to work?

Azure Speech in Foundry Tools

1 answer

Sort by: Most helpful
  1. AI answer

    2026-06-02T11:21:39.95+00:00

    There is no region-specific or feature-gating prerequisite for input_audio_transcription documented for Azure OpenAI Realtime beyond using the correct models and schema. A 500 response when adding input_audio_transcription indicates a service-side error rather than a validation or configuration issue on the client.

    From the Realtime REST reference:

    • input_audio_transcription is an optional configuration object on the Realtime session. It “defaults to off and can be set to null to turn off once on.”
    • It is not tied to a special feature flag in the public documentation.
    • The supported transcription models for this property are:
      • whisper-1
      • gpt-4o-mini-transcribe
      • gpt-4o-mini-transcribe-2025-12-15
      • gpt-4o-transcribe
      • gpt-4o-transcribe-diarize

    The only documented prerequisites for using input_audio_transcription are therefore:

    1. Use the GA Realtime API endpoints and protocol
      • Path: POST {endpoint}/openai/v1/realtime/client_secrets
      • Optional api-version query: v1 or preview (defaults to v1 if omitted).
    2. Provide a valid Realtime session payload
      • session.type must be set appropriately for a Realtime session.
      • If input_audio_transcription is included, its model value must be one of the supported transcription models listed above.
      • Optional fields like language and prompt follow the documented schema.
    3. Use a supported Realtime model deployment
      • For WebRTC/Realtime, use one of the Realtime models listed for the region (for example gpt-4o-mini-realtime-preview, gpt-4o-realtime-preview, gpt-realtime, or gpt-realtime-mini as documented for East US 2 and Sweden Central).
      • Ensure the deployment name in the session configuration matches the actual deployment.

    Given that:

    • The same request succeeds (200) when input_audio_transcription is omitted, and
    • The documentation does not list any additional region-specific requirements or feature flags for input_audio_transcription,

    this pattern is consistent with a backend issue in the Sweden Central Realtime service rather than a missing prerequisite in the request. The next steps that align with the documentation are:

    1. Verify that the input_audio_transcription.model is one of the supported transcription models listed above.
    2. Confirm that the Realtime model deployment being used is supported in Sweden Central and that the deployment name is correct.
    3. If the 500 persists with a valid payload and supported model, treat it as a service-side incident and raise it via Azure support, including:
      • Region (Sweden Central), resource name, and subscription ID.
      • The exact POST /openai/v1/realtime/client_secrets payload with and without input_audio_transcription.
      • Timestamps and apim-request-id values from the failing 500 responses.

    There is no documented feature-gating or extra configuration required beyond the above; a persistent 500 in this scenario requires service investigation.


    References:

    AI-generated content may be incorrect. Read our transparency notes for more information.

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.