Share via

Azure OpenAI Realtime API returning server_error and failing to complete response

Mukul Munjal 0 Reputation points
2026-03-14T12:00:38.0366667+00:00

I am experiencing a server error when using the Azure OpenAI Realtime API. The request starts successfully, but the response fails before the assistant output is completed

Issue Description

During a realtime session, the API returns a response object but the status changes to failed with a server_error.

The assistant response does not complete, which interrupts the realtime conversation.

{
"type": "response.done",
"event_id": "event_DICHED8JCp84nZTAtpUbw",
"response": {
"object": "realtime.response",
"id": "resp_DICHBbtv1ecrxAa1RL8dW",
"status": "failed",
"status_details": {
"type": "failed",
"error": {
"type": "server_error",
"code": null,
"message": "The server had an error while processing your request."
}
}
},
"session_id": "sess_DICGH4fzbh7lf6m4VMm28"
}

Observed Behavior

  • Realtime session connects successfully.
  • Assistant response begins processing.
  • The response status returns failed with server_error.
  • The assistant output (audio/transcript) is not completed.

Expected Behavior

The realtime API should return a successful response with the assistant output instead of failing.

Azure OpenAI Service
Azure OpenAI Service

An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.


2 answers

Sort by: Most helpful
  1. SRILAKSHMI C 16,975 Reputation points Microsoft External Staff Moderator
    2026-03-23T10:31:05.8+00:00

    Hello Mukul Munjal

    Welcome to Microsoft Q&A and Thank you for sharing the detailed description and event payload.

    From what you’ve described, the Realtime session is established successfully and response generation begins, but then fails mid‑stream with a server_error and the response status changes to failed. This is not expected behavior.

    What This Indicates

    Session connection is successful

    Response generation starts

    Failure occurs during processing

    This pattern typically points to a transient backend or dependency issue within the Realtime service pipeline, rather than a problem with your request format.

    Recommended Checks & Mitigations

    While we investigate this from the backend side, here are steps you can take to stabilize and isolate the issue:

    Add Retry Logic

    • server_error (HTTP 5xx) is often transient.
    • Implement retry with exponential backoff.
    • If using an SDK, increase retry attempts (default is usually low).

    Verify Endpoint, Deployment & Parameters

    Ensure your endpoint format is correct:

    Code

    https://<resource>.openai.azure.com/openai/deployments/<deployment>/...
    

    Double‑check deployment name, API version, and parameters such as stream: true, max_completion_tokens (avoid very large values), temperature, top_p.

    Test with a Minimal Request

    Try a simple text‑only streaming request:

    {
      "model": "<your-deployment>",
      "messages": [
        { "role": "user", "content": "Hello" }
      ],
      "stream": true,
      "max_completion_tokens": 1000
    }
    

    If this succeeds, gradually add audio or larger prompts to identify if complexity is triggering the failure.

    Check Input / Streaming Behavior

    Large prompts, long audio streams, or rapid event bursts can cause mid‑stream failures.

    For audio Format should be PCM16 mono, 24 kHz., Send small chunks (~100 ms), Properly base64 encode payloads.

    Use Azure Monitor to check request volume, throttling, or failures.

    If the region is under load, consider scaling your resource or testing in another region.

    Ensure you are not hitting realtime session limits or rate limits, which can surface as generic failures.

    Check Azure Service Health / Resource Health for any ongoing issues in your region.

    Please refer this

    GPT Realtime API troubleshooting (speech & audio): https://learn.microsoft.com/azure/ai-foundry/openai/how-to/realtime-audio#troubleshooting

    I Hope this helps. Do let me know if you have any further queries.

    Thank you!

    0 comments No comments

  2. Q&A Assist
    2026-03-14T12:01:04.05+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    The server_error with status: failed is a generic service-side failure while processing the realtime response. The event sequence in a healthy text-in, audio-out interaction should progress through multiple response.* and response.audio_* events and end with response.done where status reflects successful completion and includes the transcript (for example, “Hello! How can I assist you today?”). In this case, the pipeline is aborting before completion.

    From the documented behavior of the GPT Realtime API:

    • A session is established via /realtime, returning session.created.
    • A response.create client event is sent with modalities such as ["text", "audio"] and instructions.
    • The server emits intermediate events like:
      • response.output_item.added
      • conversation.item.created
      • response.content_part.added
      • multiple response.audio_transcript.delta and response.audio.delta events
    • A successful interaction ends with:
      • response.audio.done
      • response.audio_transcript.done
      • response.content_part.done
      • response.output_item.done
      • response.done with a completed transcript.

    In the failing case, the final response.done has status: failed and error.type: server_error, which indicates the service could not complete generation for that response. The error is not caused by client-side event sequencing alone, because the service already accepted the response.create and started processing.

    Actionable steps:

    1. Ensure the client is following the documented event flow and not prematurely closing the WebSocket or cancelling the response. The loop should continue reading until response.done or error is received:
         for await (const message of client.messages()) {
             console.log(JSON.stringify(message, null, 2));
             if (message.type === "response.done" || message.type === "error") {
                 break;
             }
         }
      
    2. Confirm that the response.create payload matches the documented shape (for example, modalities: ["text", "audio"], valid instructions, and other fields supported by the deployment) and that the session duration (max 30 minutes) is not exceeded.
    3. Since server_error is a backend failure, capture the full response.id, event_id, and session_id from the failing response.done event and use them when engaging Azure support or monitoring tools. These identifiers allow service-side investigation of the failure.
    4. If the error is intermittent, retry the response.create within the same session or a new session and compare whether the same prompt or configuration consistently triggers server_error. Consistent failures with valid inputs typically require service-side investigation.

    If the failure persists with correct event sequencing and valid payloads, the issue is likely on the service side and should be escalated with the captured IDs and timestamps.


    References:

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.