Share via

OpenAI Realtime API transcription with GPT-4o-transcribe, WS connection is forcefully closed with code 1006.

Soender 20 Reputation points
2025-11-22T17:02:04.99+00:00

Context:

I am building a voice-to-voice conversational AI service, which obviously must include a STT service. For this, I have selected to deploy GPT-4o-transcribe, with the OpenAI Realtime API (hosted by Azure, Azure OAI service).
I have followed the documentation given by OpenAI at:
https://platform.openai.com/docs/api-reference/realtime
(I understand that there is a difference between OpenAI and Azure OAI, regarding architecture etc., but the documentation seems to be correct regarding Azure OAI service as well).

I can start the connection and get transcriptions. Everything is working great, but, at random points in time, the connection is closed, with code 1006. No reason is given. And no previous events show any sign of failures. As mentioned, this happens randomly, thus it being very hard to replicate the issue.

Technical implementation:

I run my own VAD (Voice Activity Detection) model, so I do not rely on the VAD provided by the Realtime service. Moreover, audio is continuously sent from the client, to my server, as small chunks. Here is the current flow:

At startup, the connection to the Realtime service is created, with:
Target URL:

wss://${AZURE_RESOURCE}/openai/v1/realtime?deployment=${AZURE_DEPLOYMENT}&intent=transcription;

Headers:

{
  headers: {"api-key": KEY}
}

And session.update:

{
  type: `session.update`,
       session: {
       type: `transcription`,
       include: [
          `item.input_audio_transcription.logprobs`
       ],
       audio: {
             input: {
                  format: {
                         rate: 24000, 
                         type: `audio/pcm`
                   },
                   noise_reduction: {
                      type: `near_field`
                   },
                   turn_detection: null,
                   transcription: {
                       language: `en`,
                       model: AZURE_DEPLOYMENT,
                       prompt: PROMPT
                    }

              }
        }
   }
}

Then, the flow is as follows:

  1. VAD emits that user is speaking;
  2. This triggers a message to the Realtime Service:
{ type: `input_audio_buffer.clear` }
  1. Every chunk of audio is sent as PCM 24k, base64 encoded, to the Realtime Service as:
{
   type: input_audio_buffer.append,
   audio: BASE_64_PCM_24
}
  1. At some point, the VAD emits that user has stopped speaking, which triggers this message to the Realtime service:
{ type: `input_audio_buffer.commit` }
  1. A result is obtained, through the Realtime service message:
conversation.item.input_audio_transcription.completed

Then N seconds passes, until VAD emits that the user is speaking, and the process repeats from 1).
Moreover, other messages from the Realtime service are handled, like errors, etc.

Problem:

Occasionally, the connection is closed, with code 1006. That is, the connection is not closed by the client, but on the server side (Realtime Service server):

Sometimes this happens while transcription is running (that is, while step 3 is running, and audio is being comitted).
Sometimes this happens when the server is in waiting position (before step 1 has fired).
The problem occurs between 5-15 minutes of opening the connection.

Notes:

I have implemented a reconnection functionality, after 25min has passed from opening the connection. This is due to the 30min hard limit, which the connection can live. So this is not the problem.

I run multiple workers of the type described above, one for each current client. Between 0-10 workers exist in the server at any given time. Perhaps multiple concurrent connections are not allowed?
As mentioned, the Realtime service does not emit any error messages before closing the connection with 1006. I am watching these types of error messages:

error
conversation.item.input_audio_transcription.failed

Perhaps there are more to check for.

Any ideas to why the connection is closed in such a manner?
Thanks!

Azure OpenAI in Foundry Models
0 comments No comments

Answer accepted by question author

  1. SRILAKSHMI C 18,040 Reputation points Microsoft External Staff Moderator
    2025-11-25T12:53:57.9533333+00:00

    Hello Soender,

    Welcome to Microsoft Q&A and Thank you for reaching out.

    I understand that the WebSocket disconnect you’re seeing (close code 1006) is coming from the Realtime API side rather than your client. A 1006 generally means the server closed the connection abruptly without sending a proper close frame. With Azure’s Realtime endpoint, this can happen under a few specific conditions that aren’t always surfaced as explicit errors.

    From similar cases we’ve handled, there are a few patterns worth calling out:

    1. Why 1006 happens

    A 1006 usually points to an internal service safeguard being hit things like idle timeouts, buffer pressure, audio arriving too fast, or the session running longer or “hotter” than expected. Azure’s Realtime service doesn’t currently emit a detailed error event before this happens, so the lack of a warning is expected.

    2. Long, continuous audio streams can trigger early closes

    Even though the max session duration is documented as ~30 minutes, continuous audio streaming (especially without natural pauses) can cause the server to end the session earlier sometimes anywhere in the 5–20-minute range. This is tied to pacing, buffer windows, and internal session-health checks.

    3. Multiple workers can increase the likelihood

    Running several Realtime sessions at the same time is supported, but concurrency and capacity vary by region. We’ve seen cases where multiple long-running, audio-heavy sessions trigger protective throttles, especially when the service is under higher load.

    4. What helps reduce these disconnects

    A few adjustments tend to stabilize things:

    • Reconnect proactively every 5–10 minutes, rather than waiting for the 25–30 min ceiling.
    • Pace audio chunks consistently (around 20–50 ms intervals), and avoid micro-bursts.
    • After each transcription sequence, ensure the buffer is reset using: { "type": "input_audio_buffer.clear" }
    • Avoid long stretches of low-energy “silence” audio, which can keep buffers active unnecessarily.
    • If supported by your client, send periodic heartbeat pings.
    • Log all message types sometimes lesser-known events show up before the drop.

    5. Why it’s hard to reproduce consistently

    These disconnects usually appear only under real workloads, multiple streams, region-level load, or specific pacing patterns. That's why it feels random and doesn’t show any clear error in logs.

    Make sure your API key and endpoint configuration are correct. Verify that the target URL is well-formatted and points to the correct Azure region for your deployment. You can refer to this documentation for further assistance: Realtime API via WebSockets.

    I Hope this helps. Do let me know if you have any further queries.

    If this answers your query, could you please take a moment to retake the survey by accepting this response? Your feedback is greatly appreciated.

    Thank you!

    Was this answer helpful?


1 additional answer

Sort by: Most helpful
  1. Q&A Assist
    2025-11-22T17:02:23.7+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    The issue you're experiencing with the WebSocket connection being closed with code 1006 can be attributed to several potential factors:

    1. Connection Stability: Code 1006 indicates that the connection was closed abnormally without a close frame being sent. This can happen due to network instability or interruptions in the connection. Ensure that your network is stable and that there are no firewall or proxy settings that might be interfering with the WebSocket connection.
    2. Server-Side Limits: Since you mentioned running multiple workers, it's possible that the Azure OpenAI service has limits on concurrent connections or sessions. If you exceed these limits, it could lead to unexpected disconnections. Review the Azure OpenAI documentation for any stated limits on concurrent connections.
    3. Timeouts: The connection may be timing out due to inactivity. Even though you have implemented a reconnection strategy, ensure that your application is sending periodic keep-alive messages to maintain the connection.
    4. Audio Input Handling: Since the connection closes during audio processing, double-check that the audio input being sent is correctly formatted and that the server can handle the incoming data without issues. Ensure that the audio chunks are being sent in a timely manner and that the input_audio_buffer.commit message is sent correctly after the audio chunks.
    5. Error Handling: While you are monitoring for specific error messages, consider implementing more comprehensive error handling and logging to capture any unexpected behavior or messages from the Realtime service that could provide clues about the disconnection.
    6. Session Configuration: Review your session configuration in the session.update payload to ensure that all required parameters are correctly set and that there are no conflicting settings that might lead to disconnections.

    If the problem persists, consider reaching out to Azure support for further assistance, as they may have more insight into server-side logs that could explain the disconnections.

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.