Share via

Azure OpenAI Realtime API + gpt-realtime-whisper: is realtime transcription currently supported?

Baptiste AUTIN 0 Reputation points
2026-06-10T15:09:06.0833333+00:00

Hello,

I'm trying to implement real-time speech-to-text transcription using Azure OpenAI Realtime API and the gpt-realtime-whisper model, following both the Azure and OpenAI documentation.

However, I am observing what appears to be a contradiction between model availability and API behavior.

Environment

Azure OpenAI resource in France Central

API endpoint:

wss://<resource>.openai.azure.com/openai/v1/realtime

Authentication via API key header

  • Azure deployments:

gpt-realtime-whisper

  • gpt-realtime-1.5
  • Java 17

Test 1: Connect directly with gpt-realtime-whisper

WebSocket URL:

wss://<resource>.openai.azure.com/openai/v1/realtime?model=gpt-realtime-whisper

Azure rejects the handshake with HTTP 400:

{
  "error": {
    "code": "OpperationNotSupported",
    "message": "The realtime operation does not work with the specified model. Please choose different model and try again."
  }
}

Response headers include:

apim-request-id: 4b30c637-e2c5-41f9-8e2c-5d37fa3d22d8
x-ms-region: France Central

This suggests that gpt-realtime-whisper cannot be used as the model for the /realtime connection itself.

Test 2: Connect with gpt-realtime-1.5 and configure transcription

I then created a separate deployment:

realtime deployment      = gpt-realtime-1.5
transcription deployment = gpt-realtime-whisper

Connection:

wss://<resource>.openai.azure.com/openai/v1/realtime?model=gpt-realtime-1.5

The WebSocket handshake succeeds.

However, when sending a transcription-oriented session.update event, Azure returns:

{
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "code": "invalid_parameter",
    "message": "Passing a transcription session update event to a realtime session is not allowed."
  }
}

Event ID:

event_DpCbAikBjakRWZkdns5ey

Test 3: Using the preview transcription session workflow

I also tested the preview transcription-specific workflow documented by Azure.

First, I successfully created a transcription session using:


The response was successful and returned a valid transcription session object:


The response also contained a valid client_secret, but no explicit WebSocket URL.

Following the Azure preview documentation, I then attempted to connect to:


Azure responded with an HTTP 302 redirect to:


However, the redirected endpoint immediately returned:


According to the Azure documentation, preview endpoints should use /openai/realtime, while GA endpoints use /openai/v1/realtime, and mixing the two formats may result in a 404 error. In this case, Azure itself appears to redirect from the documented preview endpoint to a /v1/realtime endpoint that then returns 404. Is this behavior expected, or could this indicate a platform issue in the current implementation of realtime transcription sessions?

Questions

Can Microsoft confirm the current support status of Realtime Transcription in Azure OpenAI?

Specifically:

Is gpt-realtime-whisper currently supported for Azure's /realtime WebSocket endpoint?

Is Azure OpenAI expected to support OpenAI-style transcription sessions (session.type = "transcription")?

If realtime transcription is supported, what is the correct deployment/model combination and session configuration?

If it is not yet supported in Azure, is the current behavior expected even though gpt-realtime-whisper is available as a deployable model?

According to the OpenAI documentation, gpt-realtime-whisper is the recommended model for realtime transcription.

However, in Azure:

gpt-realtime-whisper is rejected as a /realtime connection model.

  • gpt-realtime-1.5 accepts the connection but rejects transcription session updates.

Therefore, it is unclear whether realtime transcription is currently available in Azure OpenAI or whether only conversational realtime sessions are supported.

Any clarification would be greatly appreciated.

Thank you.

Azure OpenAI in Foundry Models
0 comments No comments

2 answers

Sort by: Most helpful
  1. Baptiste AUTIN 0 Reputation points
    2026-06-11T08:02:30.6833333+00:00

    Thank you, this clarifies the GA /openai/v1/realtime behavior.

    Your explanation is consistent with our earlier observations:

    • using gpt-realtime-whisper as the /realtime connection model returns “The realtime operation does not work with the specified model”;
    • using an OpenAI-style dedicated transcription session on a standard realtime session is also rejected.

    So we understand that the supported GA pattern is:

    • connect to /openai/v1/realtime with a supported realtime model such as gpt-realtime-1.5;
    • enable transcription with session.input_audio_transcription using a deployment name.

    However, we also observed a different Azure behavior on June 10, 2026 with the preview transcription_sessions flow:

    1. POST https://<resource>/openai/realtimeapi/transcription_sessions?api-version=2025-04-01-preview succeeds
    2. Azure returns:
      • object = realtime.transcription_session
      • id = sess_DpE8UrTWVVY9rJQhPwDlb
      • a valid client_secret
    3. Opening: wss://<resource>/openai/realtime?api-version=2025-04-01-preview&intent=transcription&deployment=gpt-realtime-whisper returns HTTP 302
    4. Azure redirects to: wss://<resource>/v1/realtime?api-version=2025-04-01-preview&intent=transcription&deployment=gpt-realtime-whisper&api-key=...
    5. Following that redirected URL returns HTTP 404 Resource not found

    (apim-request-id: b270a055-1574-4ed6-83ec-f27ddc6f757c)

    Could you please clarify the status of this preview transcription_sessions flow on Azure OpenAI?

    • Is /openai/realtimeapi/transcription_sessions currently unsupported or deprecated for gpt-realtime-whisper?
    • If it is unsupported, why does it successfully create a realtime.transcription_session and return a client_secret?
    • If it is supported, what is the correct WebSocket URL and authentication pattern to use after the session creation call?
    • Could you share a minimal official Azure example in Java for the recommended supported websocket transcription flow?

    If the supported path is only the GA realtime session with session.input_audio_transcription, we can switch our implementation to that model.

    Was this answer helpful?

    0 comments No comments

  2. AI answer

    2026-06-10T15:09:45.02+00:00

    Realtime transcription is supported in Azure OpenAI, but not via gpt-realtime-whisper as a /realtime connection model, and not via OpenAI-style session.type = "transcription" sessions.

    Based on the documented Azure behavior:

    1. gpt-realtime-whisper is not a valid model for the /realtime WebSocket connection
      • The Realtime API in Azure currently lists only GPT realtime models as supported connection models:
        • gpt-4o-realtime-preview
        • gpt-4o-mini-realtime-preview
        • gpt-realtime
        • gpt-realtime-mini (multiple versions)
        • gpt-realtime-1.5
      • These are the models that can be passed in the model query parameter when connecting to /openai/v1/realtime.
      • gpt-realtime-whisper is documented as a streaming transcription model that runs alongside other realtime models, not as the primary realtime session model.
      • This matches the error observed: “The realtime operation does not work with the specified model. Please choose different model and try again.”
    2. How realtime transcription is enabled in Azure OpenAI
      • Realtime transcription is enabled inside a realtime session by configuring input_audio_transcription on the session, not by using gpt-realtime-whisper as the connection model and not by using session.type = "transcription".
      • The documented pattern is:
        • Connect to /openai/v1/realtime with a supported realtime model (for example gpt-realtime or gpt-realtime-mini, and by extension gpt-realtime-1.5 as listed in the supported models).
        • Immediately send a session.update event that sets session.input_audio_transcription.
      • For Azure OpenAI, the model field inside input_audio_transcription must be the name of an existing model deployment, not a raw model ID:
             {
               "type": "session.update",
               "session": {
                 "input_audio_transcription": {
                   "model": "<your-transcription-deployment-name>"
                 }
               }
             }
        
      • The realtime reference explicitly calls out this Azure deviation: the transcription model is referenced by deployment name (for example my-gpt-4o-transcribe-deployment), not by whisper-1 or similar.
    3. Which transcription models can be used with realtime sessions
      • For Voice Live / realtime-style scenarios, documentation shows two patterns:
        • Using a Foundry/MAI transcription deployment (for example "model": "mai-transcribe-1").
        • When using gpt-realtime or gpt-realtime-mini, using OpenAI transcription models such as whisper-1, gpt-4o-transcribe, gpt-4o-mini-transcribe, or gpt-4o-transcribe-diarize in input_audio_transcription.
      • In Azure, these must again be referenced by deployment name of the corresponding transcription model.
      • GPT Realtime Whisper is described as a streaming transcription model available via the Realtime API and deployed from the Foundry model catalog, to be used “alongside other realtime models … to provide source-language transcription in parallel with translation.” This aligns with the input_audio_transcription pattern rather than being the primary /realtime model.
    4. OpenAI-style session.type = "transcription" sessions
      • Azure’s Realtime API is documented as following the OpenAI Realtime API spec, but with explicit Azure deviations.
      • One documented deviation is that the model in input_audio_transcription must be a deployment name.
      • The Azure docs and examples for realtime audio and Voice Live show transcription configured via session.update on a standard realtime session, not via a separate session.type = "transcription" session.
      • The error “Passing a transcription session update event to a realtime session is not allowed.” indicates that the OpenAI-style dedicated transcription session flow is not currently exposed in the Azure GA Realtime endpoint.
    5. Expected behavior vs. current limitations
      • Supported and expected in Azure today:
        • Connect to /openai/v1/realtime with a supported realtime model (for example gpt-realtime-1.5).
        • Enable transcription by setting session.input_audio_transcription to a valid transcription deployment (for example a deployment of gpt-4o-transcribe, whisper-1, or a Foundry transcription model), referenced by deployment name.
        • Receive transcription events (conversation.item.audio_transcription.completed) as part of the same realtime session.
      • Not supported / not documented as supported in Azure today:
        • Using gpt-realtime-whisper as the model parameter for the /realtime WebSocket connection.
        • Using OpenAI-style dedicated transcription sessions (session.type = "transcription") on Azure’s GA /openai/v1/realtime endpoint.
    6. Why gpt-realtime-whisper is deployable but rejected as a /realtime model
      • GPT Realtime Whisper is available as a Global Standard deployment in Foundry and is intended to be used as a streaming transcription model alongside other realtime models.
      • In Azure’s Realtime API, this means it is used via input_audio_transcription within a session whose primary model is one of the supported realtime GPT models, not as the primary model for the WebSocket connection.
      • The behavior observed (400 on using it as the connection model, and errors when trying to use OpenAI-style transcription sessions) is therefore consistent with the current Azure documentation and model roles.

    In summary:

    • Realtime transcription is supported in Azure OpenAI, but only as a feature of a standard realtime session using a supported GPT realtime model (gpt-realtime, gpt-realtime-mini, gpt-realtime-1.5, etc.).
    • gpt-realtime-whisper is not supported as the /realtime connection model; it is used as a transcription model via session.input_audio_transcription, referenced by its deployment name.
    • OpenAI-style dedicated transcription sessions (session.type = "transcription") are not documented as supported on Azure’s GA /openai/v1/realtime endpoint, and the errors seen when attempting that flow are expected under the current Azure behavior.

    References:

    AI-generated content may be incorrect. Read our transparency notes for more information.

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.