Share via

Azure OpenAI – Does gpt-4o-realtime-whisper support live transcription during realtime voice-to-voice sessions?

Muhammad Umar 0 Reputation points
2026-05-12T08:55:53.58+00:00

Hello,

We are currently using Azure OpenAI realtime voice-to-voice sessions together with whisper-1 / gpt-4o-transcribe for live transcription.

We would like to clarify whether Azure OpenAI currently supports using gpt-4o-realtime-whisper as the transcription model during active realtime sessions.

Specifically:

  • Can gpt-4o-realtime-whisper transcribe an ongoing realtime voice-to-voice session simultaneously?
  • Or is it currently limited to classic Speech-to-Text streaming scenarios only?

We tested this successfully on the OpenAI platform directly, but on Azure OpenAI it does not appear to work.

Could you clarify whether:

  • this feature is already supported on Azure,
  • region/API-version dependent,
  • or not yet available?

Thank you.

Azure OpenAI in Foundry Models

2 answers

Sort by: Most helpful
  1. kagiyama yutaka 2,910 Reputation points
    2026-05-12T10:25:57.3366667+00:00

    I think Azure doesn’t list gpt‑4o‑realtime‑whisper for live input_audio_transcription, so realtime voice‑to‑voice transcription isn’t available with that model right now. whisper‑1 / gpt‑4o‑transcribe works reliably today, and swapping them in is all u need to keep the session stable.

    Was this answer helpful?

    0 comments No comments

  2. Q&A Assist
    2026-05-12T08:56:13.9066667+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    GPT Realtime Whisper in Azure OpenAI is designed specifically for realtime, streaming transcription of live audio and is available through the Realtime API, not only as a classic Speech-to-Text file API.

    From the provided information:

    • GPT Realtime Whisper is a streaming transcription model that “converts live audio to text in real time” and is intended to run “alongside other realtime models (such as GPT Realtime Translate) to provide source-language transcription in parallel with translation.” This explicitly covers ongoing, low‑latency, realtime scenarios such as live captions and monitoring.
    • Realtime sessions opt into transcription via the input_audio_transcription property in a session.update event. When a transcription model such as whisper-1 is configured there, the service emits conversation.item.audio_transcription.completed events during the session.
    • The documentation shows GPT Realtime Whisper as part of the same Realtime API surface as other GPT realtime models, and it is deployed as a Global Standard (pay‑as‑you‑go) model from the model catalog in Microsoft Foundry.

    However, the context provided does not explicitly state that gpt-4o-realtime-whisper can be selected as the input_audio_transcription model for an existing GPT realtime voice‑to‑voice session in Azure OpenAI, nor does it list that exact model name as a supported value in the Realtime session configuration. The context only names whisper-1 and the gpt-4o-*-transcribe family in that role.

    Therefore, based on the available documentation:

    • Azure OpenAI supports realtime streaming transcription via GPT Realtime Whisper through the Realtime API.
    • Realtime sessions can be configured for live transcription using input_audio_transcription with supported models such as whisper-1.
    • The documentation provided does not confirm that gpt-4o-realtime-whisper is currently a supported input_audio_transcription model for Azure GPT realtime voice‑to‑voice sessions, nor whether this is region or API‑version dependent.

    If gpt-4o-realtime-whisper is not working as an input_audio_transcription model in a Realtime session while whisper-1 works, this is not clarified in the supplied documentation and may indicate a current limitation or a configuration/availability issue that is not documented here.


    References:

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.