Share via

Realtime Transcription Failing When Using gpt-4o-transcribe (Whisper-1 Unaffected)

Abdul Rehman 60 Reputation points
2025-11-18T09:01:03.9133333+00:00

Realtime transcription stopped working specifically when using the gpt-4o-transcribe model. Standard transcription via whisper-1 continued working without any issues. The problem only occurred yesterday between 08:30 and 15:30.

During this timeframe, transcription worked correctly when calling the transcription endpoint directly, but when using the Azure OpenAI Realtime API, transcription systematically failed. Azure OpenAI appears to trigger transcription internally for realtime audio, but this internal call failed for gpt-4o-transcribe.
Affected Model:

  • gpt-4o-transcribe (only during the affected time window)

whisper-1 (working normally)

Time Window of Impact:

Yesterday, 08:30 → 15:30 (local time)

Observed Realtime Events: The realtime connection produced repeated events of this form:Affected Model:

gpt-4o-transcribe (only during the affected time window)

whisper-1 (working normally)

Time Window of Impact:

Yesterday, 08:30 → 15:30 (local time)

Observed Realtime Events:
The realtime connection produced repeated events of this form:
{

"event": {

"type": "conversation.item.input_audio_transcription.failed",

"error": {

"code": null,

"type": "server_error",

"param": null,

"message": "Input transcription failed for item 'item_CcnHforF7TNkgG594R1Q1' ."

},

"item_id": "item_CcnHforF7TNkgG594R1Q1",

"event_id": "event_CcnHj7YYMIdh88hgEeWUj",

"content_index": 0

}

}

Expected Behavior: Realtime transcription using gpt-4o-transcribe should behave the same as standard transcription or as realtime transcription using whisper-1.

Actual Behavior: Realtime transcription fails with a server-side error only for gpt-4o-transcribe, while standalone transcription and whisper-1 realtime transcription work correctly.

Request: Please investigate whether this was a temporary service degradation on Azure OpenAI for the gpt-4o-transcribe realtime pathway, and confirm whether any mitigations or configuration changes are needed on our side.

Azure Speech in Foundry Tools

1 answer

Sort by: Most helpful
  1. SRILAKSHMI C 17,875 Reputation points Microsoft External Staff Moderator
    2025-11-18T11:39:30.4933333+00:00

    Hello Abdul Rehman,

    Welcome to Microsoft Q&A.

    Thank you for providing the detailed breakdown of the issue.

    Based on your description, the behavior you observed aligns with known limitations in how Azure OpenAI currently handles transcription within the Realtime API, especially when compared to the standalone transcription endpoint.

    From your testing, whisper-1 continued working normally, while gpt-4o-transcribe failed only during realtime usage in the window between 08:30 and 15:30. The repeated conversation.item.input_audio_transcription.failed events indicate a server-side failure in the realtime transcription pipeline specific to this model.

    It’s important to clarify that Azure OpenAI’s Realtime API does not yet implement full support for server-side, Whisper-style transcription inside a realtime session. Even though the API accepts audio, the system does not reliably trigger the same transcription pathway used by the standalone endpoint. As a result, realtime transcription with gpt-4o-transcribe can fail intermittently, exactly as you experienced.

    Regarding the specific time window of 08:30 to 15:30, this behavior is consistent with a temporary service degradation or instability affecting only the gpt-4o-transcribe realtime pathway. Because whisper-1 and the direct transcription endpoint were unaffected, this points to a model-specific backend disruption rather than any issue on your configuration or usage.

    At this time, realtime transcription with gpt-4o-transcribe is not fully supported and may not behave the same way as standard transcription. Your expectation is valid, but today that functionality isn’t guaranteed by the platform.

    For reliable realtime audio workflows, the recommended approach is to transcribe audio externally either client-side or by using the Whisper / transcription endpoint and then send the resulting text into the Realtime API. This dual-pipeline approach is currently the most stable option and is widely used in production. If realtime audio-to-text is required directly inside the session, whisper-1 is the more stable choice compared to gpt-4o-transcribe.

    The failure you observed matches existing platform limitations and a temporary service degradation affecting gpt-4o-transcribe. No configuration changes on your side would have prevented the issue.

    I Hope this helps. Do let me know if you have any further queries.

    Thank you!

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.