Azure OpenAI – Does gpt-4o-realtime-whisper support live transcription during realtime voice-to-voice sessions?

Question

Azure OpenAI – Does gpt-4o-realtime-whisper support live transcription during realtime voice-to-voice sessions?

Muhammad Umar 0

Hello,

We are currently using Azure OpenAI realtime voice-to-voice sessions together with whisper-1 / gpt-4o-transcribe for live transcription.

We would like to clarify whether Azure OpenAI currently supports using gpt-4o-realtime-whisper as the transcription model during active realtime sessions.

Specifically:

Can gpt-4o-realtime-whisper transcribe an ongoing realtime voice-to-voice session simultaneously?
Or is it currently limited to classic Speech-to-Text streaming scenarios only?

We tested this successfully on the OpenAI platform directly, but on Azure OpenAI it does not appear to work.

Could you clarify whether:

this feature is already supported on Azure,
region/API-version dependent,
or not yet available?

Thank you.

Amanda Zhu 80 Reputation points

2026-05-12T19:20:30.27+00:00

I would not treat gpt-4o-realtime-whisper as a supported input_audio_transcription model inside an active Azure OpenAI voice-to-voice Realtime session. Azure does document GPT Realtime Whisper as a streaming transcription model available through the Realtime API, but the Realtime API’s audio input transcription model) enum doesn’t list gpt-4o-realtime-whisper.

For a voice-to-voice Realtime session today, use the documented input_audio_transcription setting with one of the listed models, such as whisper-1, gpt-4o-transcribe, gpt-4o-mini-transcribe, or gpt-4o-transcribe-diarize. You should also verify that your session is using a supported GPT realtime model and the /openai/v1 GA endpoint, since Azure Realtime support is model and deployment dependent.

One good check to try is to send a session.update using gpt-4o-realtime-whisper and confirm whether Azure returns session.updated or an error / conversation.item.input_audio_transcription.failed event naming the unsupported model. If it fails while the listed transcription models work, that suggests a current Azure support gap rather than an audio streaming issue.
SRILAKSHMI C 18,745 Reputation points Microsoft External Staff Moderator

2026-05-18T15:27:46.9+00:00

Hi @Muhammad Umar

Did you get any chance to review the above response. Do let me know if you have any further queries.

Thank you!
SRILAKSHMI C 18,745 Reputation points Microsoft External Staff Moderator

2026-05-19T11:01:23.22+00:00

Hi @Muhammad Umar,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Thank you!

3 answers

Your answer

Amanda Zhu 80 Reputation points

2026-05-12T19:20:30.27+00:00

I would not treat gpt-4o-realtime-whisper as a supported input_audio_transcription model inside an active Azure OpenAI voice-to-voice Realtime session. Azure does document GPT Realtime Whisper as a streaming transcription model available through the Realtime API, but the Realtime API’s audio input transcription model) enum doesn’t list gpt-4o-realtime-whisper.

For a voice-to-voice Realtime session today, use the documented input_audio_transcription setting with one of the listed models, such as whisper-1, gpt-4o-transcribe, gpt-4o-mini-transcribe, or gpt-4o-transcribe-diarize. You should also verify that your session is using a supported GPT realtime model and the /openai/v1 GA endpoint, since Azure Realtime support is model and deployment dependent.

One good check to try is to send a session.update using gpt-4o-realtime-whisper and confirm whether Azure returns session.updated or an error / conversation.item.input_audio_transcription.failed event naming the unsupported model. If it fails while the listed transcription models work, that suggests a current Azure support gap rather than an audio streaming issue.
SRILAKSHMI C 18,745 Reputation points Microsoft External Staff Moderator

2026-05-18T15:27:46.9+00:00

Hi @Muhammad Umar

Did you get any chance to review the above response. Do let me know if you have any further queries.

Thank you!
SRILAKSHMI C 18,745 Reputation points Microsoft External Staff Moderator

2026-05-19T11:01:23.22+00:00

Hi @Muhammad Umar,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Thank you!

Answer 1

Hello @Muhammad Umar

Thank you for Reaching out Microsoft Q&A.

Based on the current Azure OpenAI implementation, the behavior you are seeing is expected.

At this time, Azure OpenAI supports: • realtime voice-to-voice interactions through GPT-4o realtime models (such as gpt-4o-realtime-preview) • Whisper-based transcription using models like whisper-1 and gpt-4o-transcribe • streaming Speech-to-Text scenarios through Azure Speech services

However, Azure OpenAI does not currently expose a dedicated gpt-4o-realtime-whisper model within the Azure realtime endpoint (/openai/v1/realtime) for simultaneous live transcription during active voice-to-voice sessions.

This means: running realtime voice conversations and using gpt-4o-realtime-whisper as an integrated parallel transcription model is not currently supported in Azure OpenAI in the same way it may function on the public OpenAI platform.

At present, if live transcription is required alongside a realtime voice session in Azure, the recommended approaches are:

Run a parallel Speech-to-Text streaming connection You can use: Azure Speech SDK Conversation Transcription, standard Speech-to-Text streaming APIs, or whisper-1 / gpt-4o-transcribe alongside the realtime GPT-4o voice session.
Monitor model availability in your Azure region Realtime model support in Azure OpenAI is dependent on: region availability, API version, deployment type, and staged rollout status.

When/if gpt-4o-realtime-whisper becomes available in Azure OpenAI, it would appear in the supported model list for your region and API version.

Based on current availability, the integrated realtime Whisper functionality appears to be available today on: the public OpenAI platform, and certain Microsoft Foundry scenarios but not yet fully exposed through Azure OpenAI realtime APIs.

Thank you!

Answer 2

kagiyama yutaka 3,415

I think Azure doesn’t list gpt‑4o‑realtime‑whisper for live input_audio_transcription, so realtime voice‑to‑voice transcription isn’t available with that model right now. whisper‑1 / gpt‑4o‑transcribe works reliably today, and swapping them in is all u need to keep the session stable.

0 comments

Answer 3

AI answer

GPT Realtime Whisper in Azure OpenAI is designed specifically for realtime, streaming transcription of live audio and is available through the Realtime API, not only as a classic Speech-to-Text file API.

From the provided information:

GPT Realtime Whisper is a streaming transcription model that “converts live audio to text in real time” and is intended to run “alongside other realtime models (such as GPT Realtime Translate) to provide source-language transcription in parallel with translation.” This explicitly covers ongoing, low‑latency, realtime scenarios such as live captions and monitoring.
Realtime sessions opt into transcription via the input_audio_transcription property in a session.update event. When a transcription model such as whisper-1 is configured there, the service emits conversation.item.audio_transcription.completed events during the session.
The documentation shows GPT Realtime Whisper as part of the same Realtime API surface as other GPT realtime models, and it is deployed as a Global Standard (pay‑as‑you‑go) model from the model catalog in Microsoft Foundry.

However, the context provided does not explicitly state that gpt-4o-realtime-whisper can be selected as the input_audio_transcription model for an existing GPT realtime voice‑to‑voice session in Azure OpenAI, nor does it list that exact model name as a supported value in the Realtime session configuration. The context only names whisper-1 and the gpt-4o-*-transcribe family in that role.

Therefore, based on the available documentation:

Azure OpenAI supports realtime streaming transcription via GPT Realtime Whisper through the Realtime API.
Realtime sessions can be configured for live transcription using input_audio_transcription with supported models such as whisper-1.
The documentation provided does not confirm that gpt-4o-realtime-whisper is currently a supported input_audio_transcription model for Azure GPT realtime voice‑to‑voice sessions, nor whether this is region or API‑version dependent.

If gpt-4o-realtime-whisper is not working as an input_audio_transcription model in a Realtime session while whisper-1 works, this is not clarified in the supplied documentation and may indicate a current limitation or a configuration/availability issue that is not documented here.

References:

AI-generated content may be incorrect. Read our transparency notes for more information.

Share via

Azure OpenAI – Does gpt-4o-realtime-whisper support live transcription during realtime voice-to-voice sessions?

3 answers

Your answer