Edit

Voice Live 2026-06-01-preview API Reference

The Voice Live API provides real-time, bidirectional communication for voice-enabled applications using WebSocket connections.

The API uses JSON-formatted events sent over WebSocket connections to manage conversations, audio streams, avatar interactions, and real-time responses. Events are categorized into client events (sent from client to server) and server events (sent from server to client).

Note

2026-06-01-preview is a preview API version. Features and properties marked preview are subject to change before the next stable release.

What's new in 2026-06-01-preview

This API version adds the following capabilities on top of 2026-04-10:

  • azure-realtime-native voice type: A new structured voice object used exclusively with the azure-realtime model. The voice is specified as {"type": "azure-realtime-native", "name": "<voice>"} where <voice> is one of aarti, andrew, ava (default), denise, elsa, florian, francisca, meera, ximena, xiaoxiao, or yunxi.
  • Streaming text input client events: New input_text.delta and input_text.done client events let you stream text input into a conversation item incrementally, similar to how audio is streamed with input_audio_buffer.append.
  • Smart end-of-turn detection: New audio-based EOU detection variant with "model": "smart_end_of_turn_detection". It operates directly on the input audio stream and exposes the threshold_level (low, medium, high, default) and timeout_ms properties.
  • Parallel tool calls: New optional parallel_tool_calls boolean on the session object (default true). Set to false to require the model to issue tool calls sequentially.
  • Hosted agent invocation events: New server events for surfacing hosted agent invocation lifecycle and tool activity.
  • WebRTC feature events: Additional events that support the Voice Live WebRTC transport.

Endpoint and authentication

WebSocket endpoint

The WebSocket endpoint for the Voice Live API is:

wss://<your-ai-foundry-resource-name>.services.ai.azure.com/voice-live/realtime?api-version=2026-06-01-preview

For older resources that use the legacy domain, use:

wss://<your-ai-foundry-resource-name>.cognitiveservices.azure.com/voice-live/realtime?api-version=2026-06-01-preview

The endpoint is the same for all models. The only difference is the required model query parameter, or, when using the Microsoft Foundry Agent Service, the agent-name and agent-project-name query parameters. For more information about agent connection parameters, see Integrate Voice Live API with a Microsoft Foundry agent.

For example, an endpoint for a Microsoft Foundry resource that uses a model would be:

wss://<your-ai-foundry-resource-name>.services.ai.azure.com/voice-live/realtime?api-version=2026-06-01-preview&model=gpt-realtime

Note

The Voice Live API is optimized for Microsoft Foundry resources. Microsoft Foundry resources are recommended for full feature availability. Azure AI Speech resources don't support Microsoft Foundry Agent Service integration or bring-your-own-model (BYOM).

Authentication

The Voice Live API supports two authentication methods:

  • Microsoft Entra ID (recommended): Use token-based authentication for a Microsoft Foundry resource. Pass the retrieved access token in one of two ways:
    • As a Bearer token in the Authorization header on the prehandshake connection. This option isn't available in a browser environment.
    • As an Authorization query string parameter on the request URI, with the value Bearer <token>. URL-encode the value as needed. Query string parameters are encrypted by the wss:// transport.
  • API key: Provide an api-key in one of two ways:
    • As an api-key connection header on the prehandshake connection. This option isn't available in a browser environment.
    • As an api-key query string parameter on the request URI. Query string parameters are encrypted by the wss:// transport.

For the recommended keyless authentication with Microsoft Entra ID:

  1. Assign the Cognitive Services User and Azure AI User roles to your user account or managed identity. You can assign roles in the Azure portal under Access control (IAM) > Add role assignment.
  2. Acquire an access token using the Azure CLI or an Azure SDK. The token must be issued for the https://ai.azure.com/.default scope (or the legacy https://cognitiveservices.azure.com/.default scope).
  3. Send the token on the WebSocket upgrade request, either in the Authorization header in the format Bearer <token>, or as an Authorization query string parameter with the same Bearer <token> value.

Client Events

The Voice Live API supports the following client events that can be sent from the client to the server:

Event Description
session.update Update the session configuration including voice, output modalities, turn detection, and other settings
session.avatar.connect Establish avatar connection by providing client SDP for WebRTC negotiation
input_audio_buffer.append Append audio bytes to the input audio buffer
input_audio_buffer.commit Commit the input audio buffer for processing
input_audio_buffer.clear Clear the input audio buffer
input_text.delta Append a chunk of text to a streamed user-text input
input_text.done Signal that streamed user-text input is complete
conversation.item.create Add a new item to the conversation context
conversation.item.retrieve Retrieve a specific item from the conversation
conversation.item.truncate Truncate an assistant audio message
conversation.item.delete Remove an item from the conversation
response.create Instruct the server to create a response via model inference
response.cancel Cancel an in-progress response
output_audio_buffer.clear Stop the avatar from speaking by clearing the server-side output audio buffer (avatar mode only)

session.update

Update the session's configuration. This event can be sent at any time to modify settings such as voice, output modalities, turn detection, tools, and other session parameters. Note that once a session is initialized with a particular model, it can't be changed to another model.

Event Structure

{
  "type": "session.update",
  "session": {
    "modalities": ["text", "audio"],
    "voice": {
      "type": "openai",
      "name": "alloy"
    },
    "instructions": "You are a helpful assistant. Be concise and friendly.",
    "input_audio_format": "pcm16",
    "output_audio_format": "pcm16",
    "input_audio_sampling_rate": 24000,
    "turn_detection": {
      "type": "azure_semantic_vad",
      "threshold": 0.5,
      "prefix_padding_ms": 300,
      "silence_duration_ms": 500
    },
    "temperature": 0.8,
    "max_response_output_tokens": "inf"
  }
}

Properties

Field Type Description
type string Must be "session.update"
session RealtimeRequestSession Session configuration object with fields to update

Example with Azure Custom Voice

{
  "type": "session.update",
  "session": {
    "voice": {
      "type": "azure-custom",
      "name": "my-custom-voice",
      "endpoint_id": "12345678-1234-1234-1234-123456789012",
      "temperature": 0.7,
      "style": "cheerful"
    },
    "input_audio_noise_reduction": {
      "type": "azure_deep_noise_suppression"
    },
    "avatar": {
      "character": "lisa",
      "customized": false,
      "video": {
        "resolution": {
          "width": 1920,
          "height": 1080
        },
        "bitrate": 2000000
      }
    }
  }
}

session.avatar.connect

Establish an avatar connection by providing the client's SDP (Session Description Protocol) offer for WebRTC media negotiation. This event is required when using avatar features.

Event Structure

{
  "type": "session.avatar.connect",
  "client_sdp": "<client_sdp>"
}

Properties

Field Type Description
type string Must be "session.avatar.connect"
client_sdp string The client's SDP offer for WebRTC connection establishment, encoded with base64

input_audio_buffer.append

Append audio bytes to the input audio buffer.

Event Structure

{
  "type": "input_audio_buffer.append",
  "audio": "UklGRiQAAABXQVZFZm10IBAAAAABAAEARKwAAIhYAQACABAAZGF0YQAAAAA="
}

Properties

Field Type Description
type string Must be "input_audio_buffer.append"
audio string Base64-encoded audio data

input_audio_buffer.commit

Commit the input audio buffer for processing.

Event Structure

{
  "type": "input_audio_buffer.commit"
}

Properties

Field Type Description
type string Must be "input_audio_buffer.commit"

input_audio_buffer.clear

Clear the input audio buffer.

Event Structure

{
  "type": "input_audio_buffer.clear"
}

Properties

Field Type Description
type string Must be "input_audio_buffer.clear"

input_text.delta

Append a chunk of text to the current streamed user-text input. Use this event to stream text into a conversation item incrementally, similar to how audio is streamed with input_audio_buffer.append. The streamed text is finalized by sending an input_text.done event.

Event Structure

{
  "type": "input_text.delta",
  "delta": "Hello, "
}

Properties

Field Type Description
type string Must be "input_text.delta"
delta string The incremental text content to append to the current streamed input.

input_text.done

Signal that the streamed user-text input is complete. The accumulated text becomes a user message item in the conversation.

Event Structure

{
  "type": "input_text.done"
}

Properties

Field Type Description
type string Must be "input_text.done"

conversation.item.create

Add a new item to the conversation context. This can include messages, function calls, and function call responses. Items can be inserted at specific positions in the conversation history.

Event Structure

{
  "type": "conversation.item.create",
  "previous_item_id": "item_ABC123",
  "item": {
    "id": "item_DEF456",
    "type": "message",
    "role": "user",
    "content": [
      {
        "type": "input_text",
        "text": "Hello, how are you?"
      }
    ]
  }
}

Properties

Field Type Description
type string Must be "conversation.item.create"
previous_item_id string Optional. ID of the item after which to insert this item. If not provided, appends to end
item RealtimeConversationRequestItem The item to add to the conversation

Example with Audio Content

{
  "type": "conversation.item.create",
  "item": {
    "type": "message",
    "role": "user",
    "content": [
      {
        "type": "input_audio",
        "audio": "UklGRiQAAABXQVZFZm10IBAAAAABAAEARKwAAIhYAQACABAAZGF0YQAAAAA=",
        "transcript": "Hello there"
      }
    ]
  }
}

Example with Function Call output

{
  "type": "conversation.item.create",
  "item": {
    "type": "function_call_output",
    "call_id": "call_123",
    "output": "{\"location\": \"San Francisco\", \"temperature\": \"70\"}"
  }
}

Example with MCP approval response

{
  "type": "conversation.item.create",
  "item": {
    "type": "mcp_approval_response",
    "approval_request_id": "mcp_approval_req_456",
    "approve": true,
  }
}

conversation.item.retrieve

Retrieve a specific item from the conversation history. This is useful for inspecting processed audio after noise cancellation and VAD.

Event Structure

{
  "type": "conversation.item.retrieve",
  "item_id": "item_ABC123"
}

Properties

Field Type Description
type string Must be "conversation.item.retrieve"
item_id string The ID of the item to retrieve

conversation.item.truncate

Truncate an assistant message's audio content. This is useful for stopping playback at a specific point and synchronizing the server's understanding with the client's state.

Event Structure

{
  "type": "conversation.item.truncate",
  "item_id": "item_ABC123",
  "content_index": 0,
  "audio_end_ms": 5000
}

Properties

Field Type Description
type string Must be "conversation.item.truncate"
item_id string The ID of the assistant message item to truncate
content_index integer The index of the content part to truncate
audio_end_ms integer The duration up to which to truncate the audio, in milliseconds

conversation.item.delete

Remove an item from the conversation history.

Event Structure

{
  "type": "conversation.item.delete",
  "item_id": "item_ABC123"
}

Properties

Field Type Description
type string Must be "conversation.item.delete"
item_id string The ID of the item to delete

response.create

Instruct the server to create a response via model inference. This event can specify response-specific configuration that overrides session defaults.

Event Structure

{
  "type": "response.create",
  "response": {
    "modalities": ["text", "audio"],
    "instructions": "Be extra helpful and detailed.",
    "voice": {
      "type": "openai",
      "name": "alloy"
    },
    "output_audio_format": "pcm16",
    "temperature": 0.7,
    "max_response_output_tokens": 1000
  }
}

Properties

Field Type Description
type string Must be "response.create"
response RealtimeResponseOptions Optional response configuration that overrides session defaults

Example with Tool Choice

{
  "type": "response.create",
  "response": {
    "modalities": ["text"],
    "tools": [
      {
        "type": "function",
        "name": "get_current_time",
        "description": "Get the current time",
        "parameters": {
          "type": "object",
          "properties": {}
        }
      }
    ],
    "tool_choice": "get_current_time",
    "temperature": 0.3
  }
}

Example with Animation

{
  "type": "response.create",
  "response": {
    "modalities": ["audio", "animation"],
    "animation": {
      "model_name": "default",
      "outputs": ["blendshapes", "viseme_id"]
    },
    "voice": {
      "type": "azure-custom",
      "name": "my-expressive-voice",
      "endpoint_id": "12345678-1234-1234-1234-123456789012",
      "style": "excited"
    }
  }
}

Example with pre-generated assistant message

In some scenarios, you might want to generate an audio response for predefined text instead of having the model generate the text response. Use the pre_generated_assistant_message parameter in the response.create message. You can only include one text entry in the content field.

{
  "type": "response.create",
  "response": {
    "pre_generated_assistant_message": {
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "text",
          "text": "repeat what I say"
        }
      ]
    }
  }
}

When the service receives this message, it generates an audio response for the predefined text. The message is also added to the conversation context history.

response.cancel

Cancel an in-progress response. This immediately stops response generation and related audio output.

Event Structure

{
  "type": "response.cancel"
}

Properties

Field Type Description
type string Must be "response.cancel"

output_audio_buffer.clear

Clear the server-side output audio buffer. In the current preview, this event is only supported in avatar mode and is used to stop the avatar from speaking by clearing any audio (and corresponding avatar video) that the server has queued for playback. The server responds with an output_audio_buffer.cleared event.

Event Structure

{
  "type": "output_audio_buffer.clear"
}

Properties

Field Type Description
type string Must be "output_audio_buffer.clear"

input_audio_buffer.append

The client input_audio_buffer.append event is used to append audio bytes to the input audio buffer. The audio buffer is temporary storage you can write to and later commit.

In Server VAD (Voice Activity Detection) mode, the audio buffer is used to detect speech and the server decides when to commit. When server VAD is disabled, the client can choose how much audio to place in each event up to a maximum of 15 MiB. For example, streaming smaller chunks from the client can allow the VAD to be more responsive.

Unlike most other client events, the server doesn't send a confirmation response to client input_audio_buffer.append event.

Event structure

{
  "type": "input_audio_buffer.append",
  "audio": "<audio>"
}

Properties

Field Type Description
type string The event type must be input_audio_buffer.append.
audio string Base64-encoded audio bytes. This value must be in the format specified by the input_audio_format field in the session configuration.

input_audio_buffer.clear

The client input_audio_buffer.clear event is used to clear the audio bytes in the buffer.

The server responds with an input_audio_buffer.cleared event.

Event structure

{
  "type": "input_audio_buffer.clear"
}

Properties

Field Type Description
type string The event type must be input_audio_buffer.clear.

input_audio_buffer.commit

The client input_audio_buffer.commit event is used to commit the user input audio buffer, which creates a new user message item in the conversation. Audio is transcribed if input_audio_transcription is configured for the session.

When in server VAD mode, the client doesn't need to send this event, the server commits the audio buffer automatically. Without server VAD, the client must commit the audio buffer to create a user message item. This client event produces an error if the input audio buffer is empty.

Committing the input audio buffer doesn't create a response from the model.

The server responds with an input_audio_buffer.committed event.

Event structure

{
  "type": "input_audio_buffer.commit"
}

Properties

Field Type Description
type string The event type must be input_audio_buffer.commit.

Server Events

The Voice Live API sends the following server events to communicate status, responses, and data to the client:

Event Description
error Indicates an error occurred during processing
warning Indicates a warning occurred that doesn't interrupt the conversation flow
session.created Sent when a new session is successfully established
session.updated Sent when session configuration is updated
session.avatar.connecting Indicates avatar WebRTC connection is being established
conversation.item.created Sent when a new item is added to the conversation
conversation.item.retrieved Response to conversation.item.retrieve request
conversation.item.truncated Confirms item truncation
conversation.item.deleted Confirms item deletion
conversation.item.input_audio_transcription.completed Input audio transcription is complete
conversation.item.input_audio_transcription.delta Streaming input audio transcription
conversation.item.input_audio_transcription.failed Input audio transcription failed
input_audio_buffer.committed Input audio buffer was for processing
input_audio_buffer.cleared Input audio buffer was cleared
input_audio_buffer.speech_started Speech detected in input audio buffer (VAD)
input_audio_buffer.speech_stopped Speech ended in input audio buffer (VAD)
response.created New response generation started
response.done Response generation is complete
response.output_item.added New output item added to response
response.output_item.done Output item is complete
response.content_part.added New content part added to output item
response.content_part.done Content part is complete
response.text.delta Streaming text content from the model
response.text.done Text content is complete
response.audio_transcript.delta Streaming audio transcript
response.audio_transcript.done Audio transcript is complete
response.audio.delta Streaming audio content from the model
response.audio.done Audio content is complete
response.animation_blendshapes.delta Streaming animation blendshapes data
response.animation_blendshapes.done Animation blendshapes data is complete
response.audio_timestamp.delta Streaming audio timestamp information
response.audio_timestamp.done Audio timestamp information is complete
response.animation_viseme.delta Streaming animation viseme data
response.animation_viseme.done Animation viseme data is complete
response.function_call_arguments.delta Streaming function call arguments
response.function_call_arguments.done Function call arguments are complete
mcp_list_tools.in_progress MCP tool listing is in progress
mcp_list_tools.completed MCP tool listing is completed
mcp_list_tools.failed MCP tool listing has failed
response.mcp_call_arguments.delta Streaming MCP call arguments
response.mcp_call_arguments.done MCP call arguments are complete
response.mcp_call.in_progress MCP call is in progress
response.mcp_call.completed MCP call is completed
response.mcp_call.failed MCP call has failed
response.foundry_agent_call_arguments.delta Streaming foundry agent call arguments
response.foundry_agent_call_arguments.done Foundry agent call arguments are complete
response.foundry_agent_call.in_progress Foundry agent call is in progress
response.foundry_agent_call.completed Foundry agent call is completed
response.foundry_agent_call.failed Foundry agent call has failed
session.avatar.switch_to_speaking Avatar transitioned to the speaking state
session.avatar.switch_to_idle Avatar transitioned to the idle state
response.video.delta Streaming avatar video frame data
response.web_search_call.searching Web search tool call is searching
response.web_search_call.in_progress Web search tool call is in progress
response.web_search_call.completed Web search tool call completed
response.file_search_call.searching File search tool call is searching
response.file_search_call.in_progress File search tool call is in progress
response.file_search_call.completed File search tool call completed
output_audio_buffer.cleared Output audio buffer was cleared
response.audio_transcript.annotation.added An annotation was added to an audio transcript

session.created

Sent when a new session is successfully established. This is the first event received after connecting to the API.

Event Structure

{
  "type": "session.created",
  "session": {
    "id": "sess_ABC123DEF456",
    "object": "realtime.session",
    "model": "gpt-realtime",
    "modalities": ["text", "audio"],
    "instructions": "You are a helpful assistant.",
    "voice": {
      "type": "openai",
      "name": "alloy"
    },
    "input_audio_format": "pcm16",
    "output_audio_format": "pcm16",
    "input_audio_sampling_rate": 24000,
    "turn_detection": {
      "type": "azure_semantic_vad",
      "threshold": 0.5,
      "prefix_padding_ms": 300,
      "silence_duration_ms": 500
    },
    "temperature": 0.8,
    "max_response_output_tokens": "inf"
  }
}

Properties

Field Type Description
type string Must be "session.created"
session RealtimeResponseSession The created session object

session.updated

Sent when session configuration is successfully updated in response to a session.update client event.

Event Structure

{
  "type": "session.updated",
  "session": {
    "id": "sess_ABC123DEF456",
    "voice": {
      "type": "azure-custom",
      "name": "my-voice",
      "endpoint_id": "12345678-1234-1234-1234-123456789012"
    },
    "temperature": 0.7,
    "avatar": {
      "character": "lisa",
      "customized": false
    }
  }
}

Properties

Field Type Description
type string Must be "session.updated"
session RealtimeResponseSession The updated session object

session.avatar.connecting

Indicates that an avatar WebRTC connection is being established. This event is sent in response to a session.avatar.connect client event.

Event Structure

{
  "type": "session.avatar.connecting",
  "server_sdp": "<server_sdp>"
}

Properties

Field Type Description
type string Must be "session.avatar.connecting"

conversation.item.created

Sent when a new item is added to the conversation, either through a client conversation.item.create event or automatically during response generation.

Event Structure

{
  "type": "conversation.item.created",
  "previous_item_id": "item_ABC123",
  "item": {
    "id": "item_DEF456",
    "object": "realtime.item",
    "type": "message",
    "status": "completed",
    "role": "user",
    "content": [
      {
        "type": "input_text",
        "text": "Hello, how are you?"
      }
    ]
  }
}

Properties

Field Type Description
type string Must be "conversation.item.created"
previous_item_id string ID of the item after which this item was inserted
item RealtimeConversationResponseItem The created conversation item

Example with Audio Item

{
  "type": "conversation.item.created",
  "item": {
    "id": "item_GHI789",
    "type": "message",
    "status": "completed",
    "role": "user",
    "content": [
      {
        "type": "input_audio",
        "audio": null,
        "transcript": "What's the weather like today?"
      }
    ]
  }
}

conversation.item.retrieved

Sent in response to a conversation.item.retrieve client event, providing the requested conversation item.

Event Structure

{
  "type": "conversation.item.retrieved",
  "item": {
    "id": "item_ABC123",
    "object": "realtime.item",
    "type": "message",
    "status": "completed",
    "role": "assistant",
    "content": [
      {
        "type": "audio",
        "audio": "UklGRiQAAABXQVZFZm10IBAAAAABAAEARKwAAIhYAQACABAAZGF0YQAAAAA=",
        "transcript": "Hello! I'm doing well, thank you for asking. How can I help you today?"
      }
    ]
  }
}

Properties

Field Type Description
type string Must be "conversation.item.retrieved"
item RealtimeConversationResponseItem The retrieved conversation item

conversation.item.truncated

The server conversation.item.truncated event is returned when the client truncates an earlier assistant audio message item with a conversation.item.truncate event. This event is used to synchronize the server's understanding of the audio with the client's playback.

This event truncates the audio and removes the server-side text transcript to ensure there's no text in the context that the user doesn't know about.

Event structure

{
  "type": "conversation.item.truncated",
  "item_id": "<item_id>",
  "content_index": 0,
  "audio_end_ms": 0
}

Properties

Field Type Description
type string The event type must be conversation.item.truncated.
item_id string The ID of the assistant message item that was truncated.
content_index integer The index of the content part that was truncated.
audio_end_ms integer The duration up to which the audio was truncated, in milliseconds.

conversation.item.deleted

Sent in response to a conversation.item.delete client event, confirming that the specified item was removed from the conversation.

Event Structure

{
  "type": "conversation.item.deleted",
  "item_id": "item_ABC123"
}

Properties

Field Type Description
type string Must be "conversation.item.deleted"
item_id string ID of the deleted item

response.created

Sent when a new response generation begins. This is the first event in a response sequence.

Event Structure

{
  "type": "response.created",
  "response": {
    "id": "resp_ABC123",
    "object": "realtime.response",
    "status": "in_progress",
    "status_details": null,
    "output": [],
    "usage": {
      "total_tokens": 0,
      "input_tokens": 0,
      "output_tokens": 0
    }
  }
}

Properties

Field Type Description
type string Must be "response.created"
response RealtimeResponse The response object that was created

response.done

Sent when response generation is complete. This event contains the final response with all output items and usage statistics.

Event Structure

{
  "type": "response.done",
  "response": {
    "id": "resp_ABC123",
    "object": "realtime.response",
    "status": "completed",
    "status_details": null,
    "output": [
      {
        "id": "item_DEF456",
        "object": "realtime.item",
        "type": "message",
        "status": "completed",
        "role": "assistant",
        "content": [
          {
            "type": "text",
            "text": "Hello! I'm doing well, thank you for asking. How can I help you today?"
          }
        ]
      }
    ],
    "usage": {
      "total_tokens": 87,
      "input_tokens": 52,
      "output_tokens": 35,
      "input_token_details": {
        "cached_tokens": 0,
        "text_tokens": 45,
        "audio_tokens": 7
      },
      "output_token_details": {
        "text_tokens": 15,
        "audio_tokens": 20
      }
    }
  }
}

Properties

Field Type Description
type string Must be "response.done"
response RealtimeResponse The completed response object

response.output_item.added

Sent when a new output item is added to the response during generation.

Event Structure

{
  "type": "response.output_item.added",
  "response_id": "resp_ABC123",
  "output_index": 0,
  "item": {
    "id": "item_DEF456",
    "object": "realtime.item",
    "type": "message",
    "status": "in_progress",
    "role": "assistant",
    "content": []
  }
}

Properties

Field Type Description
type string Must be "response.output_item.added"
response_id string ID of the response this item belongs to
output_index integer Index of the item in the response's output array
item RealtimeConversationResponseItem The output item that was added

response.output_item.done

Sent when an output item is complete.

Event Structure

{
  "type": "response.output_item.done",
  "response_id": "resp_ABC123",
  "output_index": 0,
  "item": {
    "id": "item_DEF456",
    "object": "realtime.item",
    "type": "message",
    "status": "completed",
    "role": "assistant",
    "content": [
      {
        "type": "text",
        "text": "Hello! I'm doing well, thank you for asking."
      }
    ]
  }
}

Properties

Field Type Description
type string Must be "response.output_item.done"
response_id string ID of the response this item belongs to
output_index integer Index of the item in the response's output array
item RealtimeConversationResponseItem The completed output item

response.content_part.added

The server response.content_part.added event is returned when a new content part is added to an assistant message item during response generation.

Event Structure

{
  "type": "response.content_part.added",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0,
  "part": {
    "type": "text",
    "text": ""
  }
}

Properties

Field Type Description
type string Must be "response.content_part.added"
response_id string ID of the response
item_id string ID of the item this content part belongs to
output_index integer Index of the item in the response
content_index integer Index of this content part in the item
part RealtimeContentPart The content part that was added

response.content_part.done

The server response.content_part.done event is returned when a content part is done streaming in an assistant message item.

This event is also returned when a response is interrupted, incomplete, or cancelled.

Event Structure

{
  "type": "response.content_part.done",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0,
  "part": {
    "type": "text",
    "text": "Hello! I'm doing well, thank you for asking."
  }
}

Properties

Field Type Description
type string Must be "response.content_part.done"
response_id string ID of the response
item_id string ID of the item this content part belongs to
output_index integer Index of the item in the response
content_index integer Index of this content part in the item
part RealtimeContentPart The completed content part

response.text.delta

Streaming text content from the model. Sent incrementally as the model generates text.

Event Structure

{
  "type": "response.text.delta",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0,
  "delta": "Hello! I'm"
}

Properties

Field Type Description
type string Must be "response.text.delta"
response_id string ID of the response
item_id string ID of the item
output_index integer Index of the item in the response
content_index integer Index of the content part
delta string Incremental text content

response.text.done

Sent when text content generation is complete.

Event Structure

{
  "type": "response.text.done",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0,
  "text": "Hello! I'm doing well, thank you for asking. How can I help you today?"
}

Properties

Field Type Description
type string Must be "response.text.done"
response_id string ID of the response
item_id string ID of the item
output_index integer Index of the item in the response
content_index integer Index of the content part
text string The complete text content

response.audio.delta

Streaming audio content from the model. Audio is provided as base64-encoded data.

Event Structure

{
  "type": "response.audio.delta",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0,
  "delta": "UklGRiQAAABXQVZFZm10IBAAAAABAAEARKwAAIhYAQACABAAZGF0YQAAAAA="
}

Properties

Field Type Description
type string Must be "response.audio.delta"
response_id string ID of the response
item_id string ID of the item
output_index integer Index of the item in the response
content_index integer Index of the content part
delta string Base64-encoded audio data chunk

response.audio.done

Sent when audio content generation is complete.

Event Structure

{
  "type": "response.audio.done",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0
}

Properties

Field Type Description
type string Must be "response.audio.done"
response_id string ID of the response
item_id string ID of the item
output_index integer Index of the item in the response
content_index integer Index of the content part

response.audio_transcript.delta

Streaming transcript of the generated audio content.

Event Structure

{
  "type": "response.audio_transcript.delta",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0,
  "delta": "Hello! I'm doing"
}

Properties

Field Type Description
type string Must be "response.audio_transcript.delta"
response_id string ID of the response
item_id string ID of the item
output_index integer Index of the item in the response
content_index integer Index of the content part
delta string Incremental transcript text

response.audio_transcript.done

Sent when audio transcript generation is complete.

Event Structure

{
  "type": "response.audio_transcript.done",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0,
  "transcript": "Hello! I'm doing well, thank you for asking. How can I help you today?"
}

Properties

Field Type Description
type string Must be "response.audio_transcript.done"
response_id string ID of the response
item_id string ID of the item
output_index integer Index of the item in the response
content_index integer Index of the content part
transcript string The complete transcript text

conversation.item.input_audio_transcription.completed

The server conversation.item.input_audio_transcription.completed event is the result of audio transcription for speech written to the audio buffer.

Transcription begins when the input audio buffer is committed by the client or server (in server_vad mode). Transcription runs asynchronously with response creation, so this event can come before or after the response events.

Realtime API models accept audio natively, and thus input transcription is a separate process run on a separate speech recognition model such as whisper-1. Thus the transcript can diverge somewhat from the model's interpretation, and should be treated as a rough guide.

Event structure

{
  "type": "conversation.item.input_audio_transcription.completed",
  "item_id": "<item_id>",
  "content_index": 0,
  "transcript": "<transcript>"
}

Properties

Field Type Description
type string The event type must be conversation.item.input_audio_transcription.completed.
item_id string The ID of the user message item containing the audio.
content_index integer The index of the content part containing the audio.
transcript string The transcribed text.
logprobs array of LogProbProperties Optional. The log probabilities of the transcription tokens.
phrases array of TranscriptionPhrase Optional. The transcription phrases with timing information.

conversation.item.input_audio_transcription.delta

The server conversation.item.input_audio_transcription.delta event is returned when input audio transcription is configured, and a transcription request for a user message is in progress. This event provides partial transcription results as they become available.

Event structure

{
  "type": "conversation.item.input_audio_transcription.delta",
  "item_id": "<item_id>",
  "content_index": 0,
  "delta": "<delta>"
}

Properties

Field Type Description
type string The event type must be conversation.item.input_audio_transcription.delta.
item_id string The ID of the user message item.
content_index integer The index of the content part containing the audio.
delta string The incremental transcription text.

conversation.item.input_audio_transcription.failed

The server conversation.item.input_audio_transcription.failed event is returned when input audio transcription is configured, and a transcription request for a user message failed. This event is separate from other error events so that the client can identify the related item.

Event structure

{
  "type": "conversation.item.input_audio_transcription.failed",
  "item_id": "<item_id>",
  "content_index": 0,
  "error": {
    "code": "<code>",
    "message": "<message>",
    "param": "<param>"
  }
}

Properties

Field Type Description
type string The event type must be conversation.item.input_audio_transcription.failed.
item_id string The ID of the user message item.
content_index integer The index of the content part containing the audio.
error object Details of the transcription error.

See nested properties in the next table.

Error properties

Field Type Description
type string The type of error.
code string Error code, if any.
message string A human-readable error message.
param string Parameter related to the error, if any.

response.animation_blendshapes.delta

The server response.animation_blendshapes.delta event is returned when the model generates animation blendshapes data as part of a response. This event provides incremental blendshapes data as it becomes available.

Event structure

{
  "type": "response.animation_blendshapes.delta",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0,
  "frame_index": 0,
  "frames": [
    [0.0, 0.1, 0.2, ..., 1.0]
    ...
  ]
}

Properties

Field Type Description
type string The event type must be response.animation_blendshapes.delta.
response_id string ID of the response
item_id string ID of the item
output_index integer Index of the item in the response
content_index integer Index of the content part
frame_index integer Index of the first frame in this batch of frames
frames array of array of float Array of blendshape frames, each frame is an array of blendshape values

response.animation_blendshapes.done

The server response.animation_blendshapes.done event is returned when the model has finished generating animation blendshapes data as part of a response.

Event structure

{
  "type": "response.animation_blendshapes.done",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
}

Properties

Field Type Description
type string The event type must be response.animation_blendshapes.done.
response_id string ID of the response
item_id string ID of the item
output_index integer Index of the item in the response

response.audio_timestamp.delta

The server response.audio_timestamp.delta event is returned when the model generates audio timestamp data as part of a response. This event provides incremental timestamp data for output audio and text alignment as it becomes available.

Event structure

{
  "type": "response.audio_timestamp.delta",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0,
  "audio_offset_ms": 0,
  "audio_duration_ms": 500,
  "text": "Hello",
  "timestamp_type": "word"
}

Properties

Field Type Description
type string The event type must be response.audio_timestamp.delta.
response_id string ID of the response
item_id string ID of the item
output_index integer Index of the item in the response
content_index integer Index of the content part
audio_offset_ms integer Audio offset in milliseconds from the start of the audio
audio_duration_ms integer Duration of the audio segment in milliseconds
text string The text segment corresponding to this audio timestamp
timestamp_type string The type of timestamp, currently only "word" is supported

response.audio_timestamp.done

Sent when audio timestamp generation is complete.

Event Structure

{
  "type": "response.audio_timestamp.done",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0
}

Properties

Field Type Description
type string The event type must be response.audio_timestamp.done.
response_id string ID of the response
item_id string ID of the item
output_index integer Index of the item in the response
content_index integer Index of the content part

response.animation_viseme.delta

The server response.animation_viseme.delta event is returned when the model generates animation viseme data as part of a response. This event provides incremental viseme data as it becomes available.

Event Structure

{
  "type": "response.animation_viseme.delta",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0,
  "audio_offset_ms": 0,
  "viseme_id": 1
}

Properties

Field Type Description
type string The event type must be response.animation_viseme.delta.
response_id string ID of the response
item_id string ID of the item
output_index integer Index of the item in the response
content_index integer Index of the content part
audio_offset_ms integer Audio offset in milliseconds from the start of the audio
viseme_id integer The viseme ID corresponding to the mouth shape for animation

response.animation_viseme.done

The server response.animation_viseme.done event is returned when the model has finished generating animation viseme data as part of a response.

Event Structure

{
  "type": "response.animation_viseme.done",
  "response_id": "resp_ABC123",
  "item_id": "item_DEF456",
  "output_index": 0,
  "content_index": 0
}

Properties

Field Type Description
type string The event type must be response.animation_viseme.done.
response_id string ID of the response
item_id string ID of the item
output_index integer Index of the item in the response
content_index integer Index of the content part

error

The server error event is returned when an error occurs, which could be a client problem or a server problem. Most errors are recoverable and the session stays open.

Event structure

{
  "type": "error",
  "error": {
    "code": "<code>",
    "message": "<message>",
    "param": "<param>",
    "event_id": "<event_id>"
  }
}

Properties

Field Type Description
type string The event type must be error.
error object Details of the error.

See nested properties in the next table.

Error properties

Field Type Description
type string The type of error. For example, "invalid_request_error" and "server_error" are error types.
code string Error code, if any.
message string A human-readable error message.
param string Parameter related to the error, if any.
event_id string The ID of the client event that caused the error, if applicable.

warning

The server warning event is returned when a warning occurs that doesn't interrupt the conversation flow. Warnings are informational and the session continues normally.

Event structure

{
  "type": "warning",
  "warning": {
    "code": "<code>",
    "message": "<message>",
    "param": "<param>"
  }
}

Properties

Field Type Description
type string The event type must be warning.
warning object Details of the warning. See nested properties in the next table.

Warning properties

Field Type Description
message string A human-readable warning message.
code string Optional. Warning code, if any.
param string Optional. Parameter related to the warning, if any.

input_audio_buffer.cleared

The server input_audio_buffer.cleared event is returned when the client clears the input audio buffer with a input_audio_buffer.clear event.

Event structure

{
  "type": "input_audio_buffer.cleared"
}

Properties

Field Type Description
type string The event type must be input_audio_buffer.cleared.

input_audio_buffer.committed

The server input_audio_buffer.committed event is returned when an input audio buffer is committed, either by the client or automatically in server VAD mode. The item_id property is the ID of the user message item created. Thus a conversation.item.created event is also sent to the client.

Event structure

{
  "type": "input_audio_buffer.committed",
  "previous_item_id": "<previous_item_id>",
  "item_id": "<item_id>"
}

Properties

Field Type Description
type string The event type must be input_audio_buffer.committed.
previous_item_id string The ID of the preceding item after which the new item is inserted.
item_id string The ID of the user message item created.

input_audio_buffer.speech_started

The server input_audio_buffer.speech_started event is returned in server_vad mode when speech is detected in the audio buffer. This event can happen any time audio is added to the buffer (unless speech is already detected).

Note

The client might want to use this event to interrupt audio playback or provide visual feedback to the user.

The client should expect to receive a input_audio_buffer.speech_stopped event when speech stops. The item_id property is the ID of the user message item created when speech stops. The item_id is also included in the input_audio_buffer.speech_stopped event unless the client manually commits the audio buffer during VAD activation.

Event structure

{
  "type": "input_audio_buffer.speech_started",
  "audio_start_ms": 0,
  "item_id": "<item_id>"
}

Properties

Field Type Description
type string The event type must be input_audio_buffer.speech_started.
audio_start_ms integer Milliseconds from the start of all audio written to the buffer during the session when speech was first detected. This property corresponds to the beginning of audio sent to the model, and thus includes the prefix_padding_ms configured in the session.
item_id string The ID of the user message item created when speech stops.

input_audio_buffer.speech_stopped

The server input_audio_buffer.speech_stopped event is returned in server_vad mode when the server detects the end of speech in the audio buffer.

The server also sends a conversation.item.created event with the user message item created from the audio buffer.

Event structure

{
  "type": "input_audio_buffer.speech_stopped",
  "audio_end_ms": 0,
  "item_id": "<item_id>"
}

Properties

Field Type Description
type string The event type must be input_audio_buffer.speech_stopped.
audio_end_ms integer Milliseconds since the session started when speech stopped. This property corresponds to the end of audio sent to the model, and thus includes the min_silence_duration_ms configured in the session.
item_id string The ID of the user message item created.

rate_limits.updated

The server rate_limits.updated event is emitted at the beginning of a response to indicate the updated rate limits.

When a response is created, some tokens are reserved for the output tokens. The rate limits shown here reflect that reservation, which is then adjusted accordingly once the response is completed.

Event structure

{
  "type": "rate_limits.updated",
  "rate_limits": [
    {
      "name": "<name>",
      "limit": 0,
      "remaining": 0,
      "reset_seconds": 0
    }
  ]
}

Properties

Field Type Description
type string The event type must be rate_limits.updated.
rate_limits array of RealtimeRateLimitsItem The list of rate limit information.

response.audio.delta

The server response.audio.delta event is returned when the model-generated audio is updated.

Event structure

{
  "type": "response.audio.delta",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "content_index": 0,
  "delta": "<delta>"
}

Properties

Field Type Description
type string The event type must be response.audio.delta.
response_id string The ID of the response.
item_id string The ID of the item.
output_index integer The index of the output item in the response.
content_index integer The index of the content part in the item's content array.
delta string Base64-encoded audio data delta.

response.audio.done

The server response.audio.done event is returned when the model-generated audio is done.

This event is also returned when a response is interrupted, incomplete, or cancelled.

Event structure

{
  "type": "response.audio.done",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "content_index": 0
}

Properties

Field Type Description
type string The event type must be response.audio.done.
response_id string The ID of the response.
item_id string The ID of the item.
output_index integer The index of the output item in the response.
content_index integer The index of the content part in the item's content array.

response.audio_transcript.delta

The server response.audio_transcript.delta event is returned when the model-generated transcription of audio output is updated.

Event structure

{
  "type": "response.audio_transcript.delta",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "content_index": 0,
  "delta": "<delta>"
}

Properties

Field Type Description
type string The event type must be response.audio_transcript.delta.
response_id string The ID of the response.
item_id string The ID of the item.
output_index integer The index of the output item in the response.
content_index integer The index of the content part in the item's content array.
delta string The transcript delta.

response.audio_transcript.done

The server response.audio_transcript.done event is returned when the model-generated transcription of audio output is done streaming.

This event is also returned when a response is interrupted, incomplete, or cancelled.

Event structure

{
  "type": "response.audio_transcript.done",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "content_index": 0,
  "transcript": "<transcript>"
}

Properties

Field Type Description
type string The event type must be response.audio_transcript.done.
response_id string The ID of the response.
item_id string The ID of the item.
output_index integer The index of the output item in the response.
content_index integer The index of the content part in the item's content array.
transcript string The final transcript of the audio.

response.function_call_arguments.delta

The server response.function_call_arguments.delta event is returned when the model-generated function call arguments are updated.

Event structure

{
  "type": "response.function_call_arguments.delta",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "call_id": "<call_id>",
  "delta": "<delta>"
}

Properties

Field Type Description
type string The event type must be response.function_call_arguments.delta.
response_id string The ID of the response.
item_id string The ID of the function call item.
output_index integer The index of the output item in the response.
call_id string The ID of the function call.
delta string The arguments delta as a JSON string.

response.function_call_arguments.done

The server response.function_call_arguments.done event is returned when the model-generated function call arguments are done streaming.

This event is also returned when a response is interrupted, incomplete, or cancelled.

Event structure

{
  "type": "response.function_call_arguments.done",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "call_id": "<call_id>",
  "arguments": "<arguments>"
}

Properties

Field Type Description
type string The event type must be response.function_call_arguments.done.
response_id string The ID of the response.
item_id string The ID of the function call item.
output_index integer The index of the output item in the response.
call_id string The ID of the function call.
arguments string The final arguments as a JSON string.

mcp_list_tools.in_progress

The server mcp_list_tools.in_progress event is returned when the service starts listing available tools from an MCP server.

Event structure

{
  "type": "mcp_list_tools.in_progress",
  "item_id": "<mcp_list_tools_item_id>"
}

Properties

Field Type Description
type string The event type must be mcp_list_tools.in_progress.
item_id string The ID of the MCP list tools item being processed.

mcp_list_tools.completed

The server mcp_list_tools.completed event is returned when the service completes listing available tools from an MCP server.

Event structure

{
  "type": "mcp_list_tools.completed",
  "item_id": "<mcp_list_tools_item_id>"
}
Properties
Field Type Description
type string The event type must be mcp_list_tools.completed.
item_id string The ID of the MCP list tools item being processed.

mcp_list_tools.failed

The server mcp_list_tools.failed event is returned when the service fails to list available tools from an MCP server.

Event structure

{
  "type": "mcp_list_tools.failed",
  "item_id": "<mcp_list_tools_item_id>"
}
Properties
Field Type Description
type string The event type must be mcp_list_tools.failed.
item_id string The ID of the MCP list tools item being processed.

response.mcp_call_arguments.delta

The server response.mcp_call_arguments.delta event is returned when the model-generated MCP tool call arguments are updated.

Event structure

{
  "type": "response.mcp_call_arguments.delta",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "delta": "<delta>"
}

Properties

Field Type Description
type string The event type must be response.mcp_call_arguments.delta.
response_id string The ID of the response.
item_id string The ID of the MCP tool call item.
output_index integer The index of the output item in the response.
delta string The arguments delta as a JSON string.

response.mcp_call_arguments.done

The server response.mcp_call_arguments.done event is returned when the model-generated MCP tool call arguments are done streaming.

Event structure

{
  "type": "response.mcp_call_arguments.done",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "arguments": "<arguments>"
}

Properties

Field Type Description
type string The event type must be response.mcp_call_arguments.done.
response_id string The ID of the response.
item_id string The ID of the MCP tool call item.
output_index integer The index of the output item in the response.
arguments string The final arguments as a JSON string.

response.mcp_call.in_progress

The server response.mcp_call.in_progress event is returned when an MCP tool call starts processing.

Event structure

{
  "type": "response.mcp_call.in_progress",
  "item_id": "<item_id>",
  "output_index": 0
}

Properties

Field Type Description
type string The event type must be response.mcp_call.in_progress.
item_id string The ID of the MCP tool call item.
output_index integer The index of the output item in the response.

response.mcp_call.completed

The server response.mcp_call.completed event is returned when an MCP tool call completes successfully.

Event structure

{
  "type": "response.mcp_call.completed",
  "item_id": "<item_id>",
  "output_index": 0
}

Properties

Field Type Description
type string The event type must be response.mcp_call.completed.
item_id string The ID of the MCP tool call item.
output_index integer The index of the output item in the response.

response.mcp_call.failed

The server response.mcp_call.failed event is returned when an MCP tool call fails.

Event structure

{
  "type": "response.mcp_call.failed",
  "item_id": "<item_id>",
  "output_index": 0
}

Properties

Field Type Description
type string The event type must be response.mcp_call.failed.
item_id string The ID of the MCP tool call item.
output_index integer The index of the output item in the response.

response.foundry_agent_call_arguments.delta

The server response.foundry_agent_call_arguments.delta event is returned when the model-generated foundry agent call arguments are updated.

Event structure

{
  "type": "response.foundry_agent_call_arguments.delta",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "delta": "<delta>"
}

Properties

Field Type Description
type string The event type must be response.foundry_agent_call_arguments.delta.
response_id string The ID of the response.
item_id string The ID of the foundry agent call item.
output_index integer The index of the output item in the response.
delta string The arguments delta as a JSON string.

response.foundry_agent_call_arguments.done

The server response.foundry_agent_call_arguments.done event is returned when the model-generated foundry agent call arguments are done streaming.

Event structure

{
  "type": "response.foundry_agent_call_arguments.done",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "arguments": "<arguments>"
}

Properties

Field Type Description
type string The event type must be response.foundry_agent_call_arguments.done.
response_id string The ID of the response.
item_id string The ID of the foundry agent call item.
output_index integer The index of the output item in the response.
arguments string The final arguments as a JSON string.

response.foundry_agent_call.in_progress

The server response.foundry_agent_call.in_progress event is returned when a foundry agent call starts processing.

Event structure

{
  "type": "response.foundry_agent_call.in_progress",
  "item_id": "<item_id>",
  "output_index": 0
}

Properties

Field Type Description
type string The event type must be response.foundry_agent_call.in_progress.
item_id string The ID of the foundry agent call item.
agent_response_id string The response ID from the foundry agent.
output_index integer The index of the output item in the response.

response.foundry_agent_call.completed

The server response.foundry_agent_call.completed event is returned when a foundry agent call completes successfully.

Event structure

{
  "type": "response.foundry_agent_call.completed",
  "item_id": "<item_id>",
  "agent_response_id": "<agent_response_id>",
  "output_index": 0
}

Properties

Field Type Description
type string The event type must be response.foundry_agent_call.completed.
item_id string The ID of the foundry agent call item.
output_index integer The index of the output item in the response.

response.foundry_agent_call.failed

The server response.foundry_agent_call.failed event is returned when a foundry agent call fails.

Event structure

{
  "type": "response.foundry_agent_call.failed",
  "item_id": "<item_id>",
  "output_index": 0
}

Properties

Field Type Description
type string The event type must be response.foundry_agent_call.failed.
item_id string The ID of the foundry agent call item.
output_index integer The index of the output item in the response.

response.output_item.added

The server response.output_item.added event is returned when a new item is created during response generation.

Event structure

{
  "type": "response.output_item.added",
  "response_id": "<response_id>",
  "output_index": 0
}

Properties

Field Type Description
type string The event type must be response.output_item.added.
response_id string The ID of the response to which the item belongs.
output_index integer The index of the output item in the response.
item RealtimeConversationResponseItem The item that was added.

response.output_item.done

The server response.output_item.done event is returned when an item is done streaming.

This event is also returned when a response is interrupted, incomplete, or cancelled.

Event structure

{
  "type": "response.output_item.done",
  "response_id": "<response_id>",
  "output_index": 0
}

Properties

Field Type Description
type string The event type must be response.output_item.done.
response_id string The ID of the response to which the item belongs.
output_index integer The index of the output item in the response.
item RealtimeConversationResponseItem The item that is done streaming.

response.text.delta

The server response.text.delta event is returned when the model-generated text is updated. The text corresponds to the text content part of an assistant message item.

Event structure

{
  "type": "response.text.delta",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "content_index": 0,
  "delta": "<delta>"
}

Properties

Field Type Description
type string The event type must be response.text.delta.
response_id string The ID of the response.
item_id string The ID of the item.
output_index integer The index of the output item in the response.
content_index integer The index of the content part in the item's content array.
delta string The text delta.

response.text.done

The server response.text.done event is returned when the model-generated text is done streaming. The text corresponds to the text content part of an assistant message item.

This event is also returned when a response is interrupted, incomplete, or cancelled.

Event structure

{
  "type": "response.text.done",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "content_index": 0,
  "text": "<text>"
}

Properties

Field Type Description
type string The event type must be response.text.done.
response_id string The ID of the response.
item_id string The ID of the item.
output_index integer The index of the output item in the response.
content_index integer The index of the content part in the item's content array.
text string The final text content.

session.avatar.switch_to_speaking

Returned when the avatar transitions to the speaking state. Use this event to coordinate UI changes such as showing a speaking indicator.

Event structure

{
  "type": "session.avatar.switch_to_speaking",
  "turn_id": "<turn_id>"
}

Properties

Field Type Description
type string The event type must be session.avatar.switch_to_speaking.
turn_id string Optional. The ID of the turn associated with the avatar state change.

session.avatar.switch_to_idle

Returned when the avatar transitions to the idle state.

Event structure

{
  "type": "session.avatar.switch_to_idle",
  "turn_id": "<turn_id>"
}

Properties

Field Type Description
type string The event type must be session.avatar.switch_to_idle.
turn_id string Optional. The ID of the turn associated with the avatar state change.

response.video.delta

Returned when avatar video frame data is streamed to the client. The frame payload is base64-encoded and uses the codec indicated by the codec field.

Event structure

{
  "type": "response.video.delta",
  "output_index": 0,
  "codec": "h264",
  "delta": "<base64_encoded_video_frame>"
}

Properties

Field Type Description
type string The event type must be response.video.delta.
output_index integer The index of the output item in the response.
codec string The codec used for the video data (for example, h264).
delta string The base64-encoded video frame data.

response.web_search_call.searching

Returned when a web search tool call enters the searching state.

Event structure

{
  "type": "response.web_search_call.searching",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "sequence_number": 0
}

Properties

Field Type Description
type string The event type must be response.web_search_call.searching.
response_id string The ID of the response.
item_id string The ID of the web search call item.
output_index integer The index of the output item in the response.
sequence_number integer The sequence number of the web search call.

response.web_search_call.in_progress

Returned when a web search tool call is in progress.

Event structure

{
  "type": "response.web_search_call.in_progress",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "sequence_number": 0
}

Properties

Field Type Description
type string The event type must be response.web_search_call.in_progress.
response_id string The ID of the response.
item_id string The ID of the web search call item.
output_index integer The index of the output item in the response.
sequence_number integer The sequence number of the web search call.

response.web_search_call.completed

Returned when a web search tool call has completed.

Event structure

{
  "type": "response.web_search_call.completed",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "sequence_number": 0
}

Properties

Field Type Description
type string The event type must be response.web_search_call.completed.
response_id string The ID of the response.
item_id string The ID of the web search call item.
output_index integer The index of the output item in the response.
sequence_number integer The sequence number of the web search call.

response.file_search_call.searching

Returned when a file search tool call enters the searching state.

Event structure

{
  "type": "response.file_search_call.searching",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "sequence_number": 0
}

Properties

Field Type Description
type string The event type must be response.file_search_call.searching.
response_id string The ID of the response.
item_id string The ID of the file search call item.
output_index integer The index of the output item in the response.
sequence_number integer The sequence number of the file search call.

response.file_search_call.in_progress

Returned when a file search tool call is in progress.

Event structure

{
  "type": "response.file_search_call.in_progress",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "sequence_number": 0
}

Properties

Field Type Description
type string The event type must be response.file_search_call.in_progress.
response_id string The ID of the response.
item_id string The ID of the file search call item.
output_index integer The index of the output item in the response.
sequence_number integer The sequence number of the file search call.

response.file_search_call.completed

Returned when a file search tool call has completed.

Event structure

{
  "type": "response.file_search_call.completed",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "sequence_number": 0
}

Properties

Field Type Description
type string The event type must be response.file_search_call.completed.
response_id string The ID of the response.
item_id string The ID of the file search call item.
output_index integer The index of the output item in the response.
sequence_number integer The sequence number of the file search call.

output_audio_buffer.cleared

Returned when the output audio buffer is cleared in response to a client output_audio_buffer.clear event. In the current preview, this event is only emitted in avatar mode.

Event structure

{
  "type": "output_audio_buffer.cleared"
}

Properties

Field Type Description
type string The event type must be output_audio_buffer.cleared.

response.audio_transcript.annotation.added

Returned when an annotation (for example, a citation produced by a web or file search tool) is added to an audio transcript content part.

Event structure

{
  "type": "response.audio_transcript.annotation.added",
  "response_id": "<response_id>",
  "item_id": "<item_id>",
  "output_index": 0,
  "content_index": 0,
  "annotation_index": 0,
  "annotation": {}
}

Properties

Field Type Description
type string The event type must be response.audio_transcript.annotation.added.
response_id string The ID of the response.
item_id string The ID of the item.
output_index integer The index of the output item in the response.
content_index integer The index of the content part in the item's content array.
annotation_index integer The index of the annotation.
annotation object The annotation object. The schema depends on the annotation source (for example, web search citation).

Components

Audio Formats

RealtimeAudioFormat

Base audio format used for input audio.

Allowed Values:

  • pcm16 - 16-bit PCM audio format
  • g711_ulaw - G.711 μ-law audio format
  • g711_alaw - G.711 A-law audio format

RealtimeOutputAudioFormat

Audio format used for output audio with specific sampling rates.

Allowed Values:

  • pcm16 - 16-bit PCM audio format at default sampling rate (24kHz)
  • pcm16_8000hz - 16-bit PCM audio format at 8kHz sampling rate
  • pcm16_16000hz - 16-bit PCM audio format at 16kHz sampling rate
  • g711_ulaw - G.711 μ-law (mu-law) audio format at 8kHz sampling rate
  • g711_alaw - G.711 A-law audio format at 8kHz sampling rate

RealtimeAudioInputTranscriptionSettings

Configuration for input audio transcription.

Field Type Description
model string The transcription model.
Supported with gpt-realtime and gpt-realtime-mini:
whisper-1, gpt-4o-transcribe, gpt-4o-mini-transcribe, gpt-4o-transcribe-diarize, mai-transcribe-1.
Supported with all other models and agents: azure-speech and mai-transcribe-1
language string Optional language code in BCP-47 (for example, en-US), or ISO-639-1 (for example, en), or multi languages with auto detection (for example, en,zh).

See Azure speech to text supported languages for recommended usage of this setting.
custom_speech object Optional configuration for custom speech models, only valid for azure-speech model.
phrase_list string[] Optional list of phrase hints to bias recognition, only valid for azure-speech model.
prompt string Optional prompt text to guide transcription, only valid for whisper-1, gpt-4o-transcribe, gpt-4o-mini-transcribe and gpt-4o-transcribe-diarize models.

RealtimeInputAudioNoiseReductionSettings

This can be:

RealtimeOpenAINoiseReduction

OpenAI noise reduction configuration with explicit type field, only available for gpt-realtime and gpt-realtime-mini models.

Field Type Description
type string near_field or far_field

RealtimeAzureDeepNoiseSuppression

Configuration for input audio noise reduction.

Field Type Description
type string Must be "azure_deep_noise_suppression"

RealtimeInputAudioEchoCancellationSettings

Echo cancellation configuration for server-side audio processing.

Field Type Description
type string Must be "server_echo_cancellation"

Voice Configuration

RealtimeVoice

Union of all supported voice configurations.

This can be:

RealtimeOpenAIVoice

OpenAI voice configuration with explicit type field.

Field Type Description
type string Must be "openai"
name string OpenAI voice name: alloy, ash, ballad, coral, echo, sage, shimmer, verse, marin, cedar

RealtimeAzureVoice

Base for Azure voice configurations. This is a discriminated union with different types:

RealtimeAzureStandardVoice

Azure standard voice configuration.

Field Type Description
type string Must be "azure-standard"
name string Voice name (can't be empty)
temperature number Optional. Temperature between 0.0 and 1.0
custom_lexicon_url string Optional. URL to custom lexicon
custom_text_normalization_url string Optional. URL to custom text normalization
prefer_locales string[] Optional. Preferred locales
Prefer locales change the accents of languages. If the value isn't set, TTS uses default accent of each language. For example when TTS speaking English, it uses the American English accent. And when speaking Spanish, it uses the Mexican Spanish accent.
If set the prefer_locales to ["en-GB", "es-ES"], the English accent is British English and the Spanish accent is European Spanish. And TTS also able to speak other languages like French, Chinese, etc.
locale string Optional. Locale specification
Enforce The locale for TTS output. If not set, TTS always uses the given locale to speak. For example set locale to en-US, TTS always uses American English accent to speak the text content, even the text content is in another language. And TTS will output silence if the text content is in Chinese.
style string Optional. Voice style
pitch string Optional. Pitch adjustment for the voice output. Follows the same rules as the pitch attribute of the SSML prosody element (see Adjust prosody). Typical values: a named level (x-low, low, medium, high, x-high, default), a relative change (for example +10%, -5%, +50Hz, -2st), or an absolute frequency (for example 200Hz).
rate string Optional. Speaking rate adjustment for the voice output. Follows the same rules as the rate attribute of the SSML prosody element (see Adjust prosody). Typical values: a named level (x-slow, slow, medium, fast, x-fast, default), a relative percentage (for example +20%, -10%), or a non-negative multiplier (for example 0.5, 1.5).
volume string Optional. Volume adjustment for the voice output. Follows the same rules as the volume attribute of the SSML prosody element (see Adjust prosody). Typical values: a named level (silent, x-soft, soft, medium, loud, x-loud, default), an absolute number from 0.0 to 100.0, or a relative change (for example +10, -6dB).
RealtimeAzureCustomVoice

Azure custom voice configuration (preferred for custom voices).

Field Type Description
type string Must be "azure-custom"
name string Voice name (can't be empty)
endpoint_id string Endpoint ID (can't be empty)
temperature number Optional. Temperature between 0.0 and 1.0
custom_lexicon_url string Optional. URL to custom lexicon
custom_text_normalization_url string Optional. URL to custom text normalization
prefer_locales string[] Optional. Preferred locales
Prefer locales change the accents of languages. If the value isn't set, TTS uses default accent of each language. For example When TTS speaking English, it uses the American English accent. And when speaking Spanish, it uses the Mexican Spanish accent.
If set the prefer_locales to ["en-GB", "es-ES"], the English accent is British English and the Spanish accent is European Spanish. And TTS also able to speak other languages like French, Chinese, etc.
locale string Optional. Locale specification
Enforce The locale for TTS output. If not set, TTS always uses the given locale to speak. For example set locale to en-US, TTS always uses American English accent to speak the text content, even the text content is in another language. And TTS will output silence if the text content is in Chinese.
style string Optional. Voice style
pitch string Optional. Pitch adjustment for the voice output. Follows the same rules as the pitch attribute of the SSML prosody element (see Adjust prosody). Typical values: a named level (x-low, low, medium, high, x-high, default), a relative change (for example +10%, -5%, +50Hz, -2st), or an absolute frequency (for example 200Hz).
rate string Optional. Speaking rate adjustment for the voice output. Follows the same rules as the rate attribute of the SSML prosody element (see Adjust prosody). Typical values: a named level (x-slow, slow, medium, fast, x-fast, default), a relative percentage (for example +20%, -10%), or a non-negative multiplier (for example 0.5, 1.5).
volume string Optional. Volume adjustment for the voice output. Follows the same rules as the volume attribute of the SSML prosody element (see Adjust prosody). Typical values: a named level (silent, x-soft, soft, medium, loud, x-loud, default), an absolute number from 0.0 to 100.0, or a relative change (for example +10, -6dB).

Example:

{
  "type": "azure-custom",
  "name": "my-custom-voice",
  "endpoint_id": "12345678-1234-1234-1234-123456789012",
  "temperature": 0.7,
  "style": "cheerful",
  "locale": "en-US"
}
RealtimeAzurePersonalVoice

Azure personal voice configuration.

Field Type Description
type string Must be "azure-personal"
name string Voice name (can't be empty)
temperature number Optional. Temperature between 0.0 and 1.0
model string Underlying base model: DragonLatestNeural, DragonHDOmniLatestNeural, MAI-Voice-1
custom_lexicon_url string Optional. URL to custom lexicon
custom_text_normalization_url string Optional. URL to custom text normalization
prefer_locales string[] Optional. Preferred locales
Prefer locales change the accents of languages. If the value isn't set, TTS uses default accent of each language. For example when TTS speaking English, it uses the American English accent. And when speaking Spanish, it uses the Mexican Spanish accent.
If set the prefer_locales to ["en-GB", "es-ES"], the English accent is British English and the Spanish accent is European Spanish. And TTS also able to speak other languages like French, Chinese, etc.
locale string Optional. Locale specification
Enforce The locale for TTS output. If not set, TTS always uses the given locale to speak. For example set locale to en-US, TTS always uses American English accent to speak the text content, even the text content is in another language. And TTS will output silence if the text content is in Chinese.
pitch string Optional. Pitch adjustment for the voice output. Follows the same rules as the pitch attribute of the SSML prosody element (see Adjust prosody). Typical values: a named level (x-low, low, medium, high, x-high, default), a relative change (for example +10%, -5%, +50Hz, -2st), or an absolute frequency (for example 200Hz).
rate string Optional. Speaking rate adjustment for the voice output. Follows the same rules as the rate attribute of the SSML prosody element (see Adjust prosody). Typical values: a named level (x-slow, slow, medium, fast, x-fast, default), a relative percentage (for example +20%, -10%), or a non-negative multiplier (for example 0.5, 1.5).
volume string Optional. Volume adjustment for the voice output. Follows the same rules as the volume attribute of the SSML prosody element (see Adjust prosody). Typical values: a named level (silent, x-soft, soft, medium, loud, x-loud, default), an absolute number from 0.0 to 100.0, or a relative change (for example +10, -6dB).
RealtimeAzureRealtimeNativeVoice

Voice configuration for the azure-realtime model. The azure-realtime model accepts only azure-realtime-native voices, and azure-realtime-native voices aren't accepted by other models.

Field Type Description
type string Must be "azure-realtime-native"
name string Voice name. One of aarti, andrew, ava (default), denise, elsa, florian, francisca, meera, ximena, xiaoxiao, yunxi. If not specified, ava is used.

Example:

{
  "voice": {
    "type": "azure-realtime-native",
    "name": "ava"
  }
}

Turn Detection

RealtimeTurnDetection

Configuration for turn detection. This is a discriminated union supporting multiple VAD types.

RealtimeServerVAD

Base VAD-based turn detection.

Field Type Description
type string Must be "server_vad"
threshold float Optional. Activation threshold (0.0-1.0) (default: 0.5)
prefix_padding_ms integer Optional. Audio padding before speech starts (default: 300)
silence_duration_ms integer Optional. Silence duration to detect speech end (default: 500)
speech_duration_ms integer Optional. Minimum speech duration (default: 200)
end_of_utterance_detection RealtimeEOUDetection Optional. End-of-utterance detection config
create_response boolean Optional. Enable or disable whether a response is generated (default: true).
interrupt_response boolean Optional. Enable or disable barge-in interruption (default: true).
auto_truncate boolean Optional. Auto-truncate on interruption (default: false)
RealtimeOpenAISemanticVAD

OpenAI semantic VAD configuration which uses a model to determine when the user has finished speaking. Only available for gpt-realtime and gpt-realtime-mini models.

Field Type Description
type string Must be "semantic_vad"
eagerness string Optional. This is a way to control how eager the model is to interrupt the user, tuning the maximum wait timeout. In transcription mode, even if the model doesn't reply, it affects how the audio is chunked.
The following values are allowed:
- auto (default) is equivalent to medium,
- low lets the user take their time to speak,
- high will chunk the audio as soon as possible.

If you want the model to respond more often in conversation mode, or to return transcription events faster in transcription mode, you can set eagerness to high.
On the other hand, if you want to let the user speak uninterrupted in conversation mode, or if you would like larger transcript chunks in transcription mode, you can set eagerness to low.
create_response boolean Optional. Enable or disable whether a response is generated (default: true).
interrupt_response boolean Optional. Enable or disable barge-in interruption (default: true).
RealtimeAzureSemanticVAD

Azure semantic VAD, which determines when the user starts and speaking using a semantic speech model, providing more robust detection in noisy environments.

Field Type Description
type string Must be "azure_semantic_vad"
threshold float Optional. Activation threshold (default: 0.5)
prefix_padding_ms integer Optional. Audio padding before speech (default: 300)
silence_duration_ms integer Optional. Silence duration for speech end (default: 500)
end_of_utterance_detection RealtimeEOUDetection Optional. EOU detection config
speech_duration_ms integer Optional. Minimum speech duration (default: 80)
remove_filler_words boolean Optional. Remove filler words (default: false)
languages string[] Optional. Supports English. Other languages are ignored (default: none).
create_response boolean Optional. Enable or disable whether a response is generated (default: true).
interrupt_response boolean Optional. Enable or disable barge-in interruption (default: true).
auto_truncate boolean Optional. Auto-truncate on interruption (default: false)
RealtimeAzureSemanticVADMultilingual

Azure semantic VAD (default variant).

Field Type Description
type string Must be "azure_semantic_vad_multilingual"
threshold float Optional. Activation threshold (default: 0.5)
prefix_padding_ms integer Optional. Audio padding before speech (default: 300)
silence_duration_ms integer Optional. Silence duration for speech end (default: 500)
end_of_utterance_detection RealtimeEOUDetection Optional. EOU detection config
speech_duration_ms integer Optional. Minimum speech duration (default: 80)
remove_filler_words boolean Optional. Remove filler words (default: false)
languages string[] Optional. Supports English, Spanish, French, Italian, German (DE), Japanese, Portuguese, Chinese, Korean, Hindi. Other languages are ignored (default: none).
create_response boolean Optional. Enable or disable whether a response is generated (default: true).
interrupt_response boolean Optional. Enable or disable barge-in interruption (default: true).
auto_truncate boolean Optional. Auto-truncate on interruption (default: false)
SmartEndOfTurnDetection

Audio-based end-of-turn (EOU) detection. Operates directly on the input audio stream rather than text. Use threshold_level and timeout_ms to tune detection.

Field Type Description
model string Must be "smart_end_of_turn_detection"
threshold_level string Optional. Threshold level setting. One of low, medium, high, or default.
timeout_ms integer Optional. Maximum time in milliseconds to wait for more user speech before triggering end-of-turn.

RealtimeEOUDetection

Azure End-of-Utterance (EOU) could indicate when the end-user stopped speaking while allowing for natural pauses. End of utterance detection can significantly reduce premature end-of-turn signals without adding user-perceivable latency.

Field Type Description
model string Could be semantic_detection_v1 supporting English or semantic_detection_v1_multilingual supporting English, Spanish, French, Italian, German (DE), Japanese, Portuguese, Chinese, Korean, Hindi
threshold_level string Optional. Detection threshold level (low, medium, high and default), the default equals medium setting. With a lower setting the probability the sentence is complete will be higher.
timeout_ms number Optional. Maximum time in milliseconds to wait for more user speech. Defaults to 1000 ms.

Avatar Configuration

RealtimeAvatarConfig

Configuration for avatar streaming and behavior.

Field Type Description
type string Optional. Avatar type. Allowed values: video-avatar, photo-avatar. Default is video-avatar
ice_servers RealtimeIceServer[] Optional. ICE servers for WebRTC
character string Character name or ID for the avatar
style string Optional. Avatar style (emotional tone, speaking style)
customized boolean Whether the avatar is customized
model string Optional. Base model name for the photo avatar, required if type is photo-avatar, valid value is vasa-1
video RealtimeVideoParams Optional. Video configuration
scene RealtimeAvatarScene Optional. Configuration for the avatar's zoom level, position, rotation and movement amplitude in the video frame
output_protocol string Optional. Output protocol for avatar streaming. Allowed values: websocket and webrtc. Default is webrtc
output_audit_audio boolean Optional. When enabled, forwards audit audio via WebSocket for review/debugging purposes, even when avatar output is delivered via WebRTC. Default is false

RealtimeIceServer

ICE server configuration for WebRTC connection negotiation.

Field Type Description
urls string[] ICE server URLs (TURN or STUN endpoints)
username string Optional. Username for authentication
credential string Optional. Credential for authentication

RealtimeVideoParams

Video streaming parameters for avatar.

Field Type Description
bitrate integer Optional. Bitrate in bits per second (default: 2000000)
codec string Optional. Video codec, currently only h264 (default: h264)
crop RealtimeVideoCrop Optional. Cropping settings
resolution RealtimeVideoResolution Optional. Resolution settings
background RealtimeVideoBackground Optional. Background settings
gop_size integer Optional. Group of Pictures size (default: 10, range: 1–2000)

RealtimeVideoCrop

Video crop rectangle definition.

Field Type Description
top_left integer[] Top-left corner [x, y], non-negative integers
bottom_right integer[] Bottom-right corner [x, y], non-negative integers

RealtimeVideoResolution

Video resolution specification.

Field Type Description
width integer Width in pixels (must be > 0)
height integer Height in pixels (must be > 0)

RealtimeVideoBackground

Video background configuration. Only one of image_url or color can be set.

Field Type Description
image_url string Optional. URL to a background image
color string Optional. Background color value

RealtimeAvatarScene

Configuration for avatar's zoom level, position, rotation and movement amplitude in the video frame.

Field Type Description
zoom number Optional. Zoom level of the avatar. Range is (0, +∞). Values less than 1 zoom out, values greater than 1 zoom in. Default is 0
position_x number Optional. Horizontal position of the avatar. Range is [-1, 1], as a proportion of frame width. Negative values move left, positive values move right. Default is 0
position_y number Optional. Vertical position of the avatar. Range is [-1, 1], as a proportion of frame height. Negative values move up, positive values move down. Default is 0
rotation_x number Optional. Rotation around the X-axis (pitch). Range is [-π, π] in radians. Negative values rotate up, positive values rotate down. Default is 0
rotation_y number Optional. Rotation around the Y-axis (yaw). Range is [-π, π] in radians. Negative values rotate left, positive values rotate right. Default is 0
rotation_z number Optional. Rotation around the Z-axis (roll). Range is [-π, π] in radians. Negative values rotate anticlockwise, positive values rotate clockwise. Default is 0
amplitude number Optional. Amplitude of the avatar movement. Range is (0, 1]. Values in (0, 1) mean reduced amplitude, 1 means full amplitude. Default is 0

Animation Configuration

RealtimeAnimation

Configuration for animation outputs including blendshapes and visemes.

Field Type Description
model_name string Optional. Animation model name (default: "default")
outputs RealtimeAnimationOutputType[] Optional. Output types (default: ["blendshapes"])

RealtimeAnimationOutputType

Types of animation data to output.

Allowed Values:

  • blendshapes - Facial blendshapes data
  • viseme_id - Viseme identifier data

Session Configuration

RealtimeRequestSession

Session configuration object used in session.update events.

Field Type Description
model string Optional. Model name to use
modalities RealtimeModality[] Optional. The supported output modalities for the session.

For example, "modalities": ["text", "audio"] is the default setting that enables both text and audio output modalities. To enable only text output, set "modalities": ["text"]. To enable avatar output, set "modalities": ["text", "audio", "avatar"]. You can't enable only audio.
animation RealtimeAnimation Optional. Animation configuration
voice RealtimeVoice Optional. Voice configuration
instructions string Optional. System instructions for the model. The instructions could guide the output audio if OpenAI voices are used but may not apply to Azure voices.
input_audio_sampling_rate integer Optional. Input audio sampling rate in Hz (default: 24000 for pcm16, 8000 for g711_ulaw and g711_alaw)
input_audio_format RealtimeAudioFormat Optional. Input audio format (default: pcm16)
output_audio_format RealtimeOutputAudioFormat Optional. Output audio format (default: pcm16)
input_audio_noise_reduction RealtimeInputAudioNoiseReductionSettings Configuration for input audio noise reduction. This can be set to null to turn off. Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model. Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.

This property is nullable.
input_audio_echo_cancellation RealtimeInputAudioEchoCancellationSettings Configuration for input audio echo cancellation. This can be set to null to turn off. This service side echo cancellation can help improve the quality of the input audio by reducing the impact of echo and reverberation.

This property is nullable.
input_audio_transcription RealtimeAudioInputTranscriptionSettings The configuration for input audio transcription. The configuration is null (off) by default. Input audio transcription isn't native to the model, since the model consumes audio directly. Transcription runs asynchronously through the /audio/transcriptions endpoint and should be treated as guidance of input audio content rather than precisely what the model heard. For additional guidance to the transcription service, the client can optionally set the language and prompt for transcription.

This property is nullable.
turn_detection RealtimeTurnDetection The turn detection settings for the session. This can be set to null to turn off.
tools array of RealtimeTool The tools available to the model for the session.
tool_choice RealtimeToolChoice The tool choice for the session.

Allowed values: auto, none, and required. Otherwise, you can specify the name of the function to use.
parallel_tool_calls boolean Optional. Whether the model may issue tool calls in parallel. Defaults to true. Set to false to require tool calls to be issued sequentially.
temperature number The sampling temperature for the model. The allowed temperature values are limited to [0.6, 1.2]. Defaults to 0.8.
max_response_output_tokens integer or "inf" The maximum number of output tokens per assistant response, inclusive of tool calls.

Specify an integer between 1 and 4096 to limit the output tokens. Otherwise, set the value to "inf" to allow the maximum number of tokens.

For example, to limit the output tokens to 1000, set "max_response_output_tokens": 1000. To allow the maximum number of tokens, set "max_response_output_tokens": "inf".

Defaults to "inf".
interim-response InterimResponseConfig Optional. Configuration for interim response generation during latency or tool calls.
reasoning_effort ReasoningEffort Optional. Constrains effort on reasoning for reasoning models. Check Azure Foundry doc for more details. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
avatar RealtimeAvatarConfig Optional. Avatar configuration
output_audio_timestamp_types RealtimeAudioTimestampType[] Optional. Timestamp types for output audio
metadata map Optional. Set of up to 16 key-value pairs that can be attached to the session. This is useful for storing additional information about the session in a structured format, such as tracking IDs, user context, or application-specific labels. These key-value pairs are also included in Microsoft Foundry resource logs for tracing and diagnostics. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.

RealtimeModality

Supported session output modalities.

Allowed Values:

  • text - Text output
  • audio - Audio output
  • animation - Animation output
  • avatar - Avatar video output

RealtimeAudioTimestampType

Output timestamp types supported in audio response content.

Allowed Values:

  • word - Timestamps per word in the output audio

ReasoningEffort

Constrains effort on reasoning for reasoning models. Check model documentation for supported values for each model. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.

Allowed Values:

  • none - No reasoning effort
  • minimal - Minimal reasoning effort
  • low - Low reasoning effort - faster responses with less reasoning
  • medium - Medium reasoning effort - balanced between speed and reasoning depth
  • high - High reasoning effort - more thorough reasoning, may take longer
  • xhigh - Extra high reasoning effort - maximum reasoning depth

Tool Configuration

We support two types of tools: function calling and MCP tools which allow you connect to an MCP server.

RealtimeTool

Tool definition for function calling.

Field Type Description
type string Must be "function"
name string Function name
description string Function description and usage guidelines
parameters object Function parameters as JSON schema object

RealtimeToolChoice

Tool selection strategy.

This can be:

  • "auto" - Let the model choose
  • "none" - Don't use tools
  • "required" - Must use a tool
  • { "type": "function", "name": "function_name" } - Use specific function

MCPTool

MCP tool configuration.

Field Type Description
type string Must be "mcp"
server_label string Required. The label of the MCP server.
server_url string Required. The server URL of the MCP server.
allowed_tools string[] Optional. The list of allowed tool names. If not specified, all tools are allowed.
headers object Optional. Additional headers to include in MCP requests.
authorization string Optional. Authorization token for MCP requests.
require_approval string or dictionary Optional.
If set to a string, The value must be never or always.
If set to a dictionary, it must be in format {"never": ["<tool_name_1>", "<tool_name_2>"], "always": ["<tool_name_3>"]}.
Default value is always.
When set to always, the tool execution requires approval, mcp_approval_request will be sent to client when MCP argument done, and will only be executed when mcp_approval_response with approve=true is received.
When set to never, the tool will be executed automatically without approval.

FoundryAgentTool

Tool definition for integrating a Foundry agent as a tool. This enables a chat-supervisor pattern where a realtime-based chat agent handles basic interactions while delegating complex tasks to a more intelligent Foundry agent.

Field Type Description
type string Must be "foundry_agent"
agent_name string Required. The name of the Foundry agent to call.
agent_version string Optional. The version of the Foundry agent to call.
project_name string Required. The name of the Foundry project containing the agent.
client_id string Optional. The client ID associated with the Foundry agent.
description string Optional. An optional description for the Foundry agent tool. If provided, it's used instead of the agent's description in Foundry portal.
foundry_resource_override string Optional. Override for the Foundry resource used to execute the agent.
agent_context_type string Optional. The context type to use when invoking the Foundry agent. Possible values: no_context, agent_context. Default is agent_context.

no_context: Only the current user input is sent, no context maintained.

agent_context: Agent maintains its own context (thread), only current input sent per call.
return_agent_response_directly boolean Optional. Whether to return the agent's response directly in the Voice Live response. Default is true. When set to false, the response is sent to the chat agent to rephrase.

Example:

{
  "instructions": "You are a helpful assistant. Please respond with a short message like 'working on this' before calling the agent tool.",
  "tools": [
    {
      "type": "foundry_agent",
      "agent_name": "customer-service-agent",
      "agent_version": "2",
      "project_name": "my-foundry-project",
      "description": "A helpful agent that can search online information and handle complex customer requests"
    }
  ]
}

Interim response configuration

Interim responses allow the system to generate placeholder audio responses while tools are being executed, improving user experience by avoiding silence.

InterimResponseConfig

Configuration for interim response generation. This is a union type that can be one of the following:

StaticInterimResponseConfig

Configuration for static interim response generation. Randomly selects from configured texts when any trigger condition is met.

Field Type Description
type string Must be "static-interim-response".
triggers InterimResponseTrigger[] Optional. List of triggers that can fire the interim response. Any trigger can activate the interim response (OR logic). Supported values: latency, tool. Default is ["latency"].
latency_threshold_ms integer Optional. Latency threshold in milliseconds before triggering interim response. Default is 2000ms. Minimum value is 0.
texts string[] Optional. List of interim response text options to randomly select from.

Example:

{
  "session": {
    "interim-response": {
      "type": "static-interim-response",
      "triggers": ["latency", "tool"],
      "latency_threshold_ms": 1500,
      "texts": [
        "Let me think about that...",
        "One moment please...",
        "Working on that for you..."
      ]
    }
  }
}

LlmInterimResponseConfig

Configuration for LLM-based interim response generation. Uses LLM to generate context-aware interim responses when any trigger condition is met.

Field Type Description
type string Must be "llm-interim-response".
triggers InterimResponseTrigger[] Optional. List of triggers that can fire the interim response. Any trigger can activate the interim response (OR logic). Supported values: latency, tool. Default is ["latency"].
latency_threshold_ms integer Optional. Latency threshold in milliseconds before triggering interim response. Default is 2000ms. Minimum value is 0.
model string Optional. The model to use for LLM-based interim response generation. Default is gpt-4.1-mini. The default model might change without a new API version.
instructions string Optional. Custom instructions for generating interim responses. If not provided, a default prompt is used.
max_completion_tokens integer Optional. Maximum number of tokens to generate for the interim response. Default is 50. Minimum value is 1.

Example:

{
  "session": {
    "interim-response": {
      "type": "llm-interim-response",
      "triggers": ["tool"],
      "latency_threshold_ms": 2000,
      "model": "gpt-4.1-mini",
      "instructions": "Generate a brief, friendly acknowledgment that you're working on the user's request.",
      "max_completion_tokens": 30
    }
  }
}

InterimResponseTrigger

Triggers that can activate interim response generation.

Allowed Values:

  • latency - Trigger interim response when response latency exceeds threshold
  • tool - Trigger interim response when a tool call is being executed

RealtimeConversationResponseItem

This is a union type that can be one of the following:

RealtimeConversationUserMessageItem

User message item.

Field Type Description
id string The unique ID of the item.
type string Must be "message"
object string Must be "conversation.item"
role string Must be "user"
content RealtimeInputTextContentPart The content of the message.
status RealtimeItemStatus The status of the item.

RealtimeConversationAssistantMessageItem

Assistant message item.

Field Type Description
id string The unique ID of the item.
type string Must be "message"
object string Must be "conversation.item"
role string Must be "assistant"
content RealtimeOutputTextContentPart[] or RealtimeOutputAudioContentPart[] The content of the message.
status RealtimeItemStatus The status of the item.

RealtimeConversationSystemMessageItem

System message item.

Field Type Description
id string The unique ID of the item.
type string Must be "message"
object string Must be "conversation.item"
role string Must be "system"
content RealtimeInputTextContentPart[] The content of the message.
status RealtimeItemStatus The status of the item.

RealtimeConversationFunctionCallItem

Function call request item.

Field Type Description
id string The unique ID of the item.
type string Must be "function_call"
object string Must be "conversation.item"
name string The name of the function to call.
arguments string The arguments for the function call as a JSON string.
call_id string The unique ID of the function call.
status RealtimeItemStatus The status of the item.

RealtimeConversationFunctionCallOutputItem

Function call response item.

Field Type Description
id string The unique ID of the item.
type string Must be "function_call_output"
object string Must be "conversation.item"
name string The name of the function that was called.
output string The output of the function call.
call_id string The unique ID of the function call.
status RealtimeItemStatus The status of the item.

RealtimeConversationMCPListToolsItem

MCP list tools response item.

Field Type Description
id string The unique ID of the item.
type string Must be "mcp_list_tools"
server_label string The label of the MCP server.

RealtimeConversationMCPCallItem

MCP call response item.

Field Type Description
id string The unique ID of the item.
type string Must be "mcp_call"
server_label string The label of the MCP server.
name string The name of the tool to call.
approval_request_id string The approval request ID for the MCP call.
arguments string The arguments for the MCP call.
output string The output of the MCP call.
error object The error details if the MCP call failed.

RealtimeConversationMCPApprovalRequestItem

MCP approval request item.

Field Type Description
id string The unique ID of the item.
type string Must be "mcp_approval_request"
server_label string The label of the MCP server.
name string The name of the tool to call.
arguments string The arguments for the MCP call.

RealtimeConversationFoundryAgentCallItem

Foundry agent call response item.

Field Type Description
id string The unique ID of the item.
type string Must be "foundry_agent_call"
name string The name of the Foundry agent.
call_id string The ID of the call.
arguments string The arguments for the foundry agent call.
agent_response_id string Optional. The response ID from the foundry agent.
output string Optional. The output of the foundry agent call.
error object Optional. The error details if the foundry agent call failed.

RealtimeConversationWebSearchCallItem

Web search call response item.

Field Type Description
id string The unique ID of the web search tool call.
type string Must be "web_search_call"
status string The status of the web search tool call. One of in_progress, searching, completed, failed.

RealtimeConversationFileSearchCallItem

File search call response item.

Field Type Description
id string The unique ID of the file search tool call.
type string Must be "file_search_call"
queries string[] Optional. The queries used for the file search.
status string The status of the file search tool call. One of in_progress, searching, completed, incomplete, failed.
results array of FileSearchResult Optional. The results of the file search.

FileSearchResult

A single file search result entry.

Field Type Description
file_id string Optional. The unique ID of the file.
filename string Optional. The name of the file.
score number Optional. The relevance score of the file search result.
text string Optional. The text content of the file that matched the query.
attributes map Optional. Key-value pairs for filtering file search results.

ActionSearch

A web search action recorded as part of a web search call.

Field Type Description
type string Must be "search".
query string Optional. The search query.
sources array of ActionSearchSource Optional. The sources used in the search.

ActionSearchSource

A source URL referenced by a web search action.

Field Type Description
type string Must be "url".
url string The URL of the source.

ActionOpenPage

An open-page action performed by the model during a web search.

Field Type Description
type string Must be "open_page".
url string The URL opened by the model.

ActionFind

A find-in-page action performed by the model during a web search.

Field Type Description
type string Must be "find".
pattern string The pattern or text to search for within the page.
url string The URL of the page searched for the pattern.

TranscriptionPhrase

A transcribed phrase with timing information, returned in conversation.item.input_audio_transcription.completed.

Field Type Description
offset_milliseconds integer Offset from the start of the audio in milliseconds.
duration_milliseconds integer Duration of the phrase in milliseconds.
text string The transcribed text of the phrase.
words array of TranscriptionWord Optional. The individual words in the phrase with timing information.
locale string Optional. The locale of the transcription (for example, en-US).
confidence number Optional. The confidence score of the transcription.

TranscriptionWord

A time-stamped word in a transcription.

Field Type Description
text string The transcribed word text.
offset_milliseconds integer Offset from the start of the audio in milliseconds.
duration_milliseconds integer Duration of the word in milliseconds.

LogProbProperties

Log-probability information for a transcription token.

Field Type Description
token string The token text.
logprob number The natural-log probability of the token.
bytes integer[] Optional. The UTF-8 byte representation of the token.

RealtimeItemStatus

Status of conversation items.

Allowed Values:

  • in_progress - Currently being processed
  • completed - Successfully completed
  • incomplete - Incomplete (interrupted or failed)

RealtimeContentPart

Content part within a message.

RealtimeInputTextContentPart

Text content part.

Field Type Description
type string Must be "input_text"
text string The text content

RealtimeOutputTextContentPart

Text content part.

Field Type Description
type string Must be "text"
text string The text content

RealtimeInputAudioContentPart

Audio content part.

Field Type Description
type string Must be "input_audio"
audio string Optional. Base64-encoded audio data
transcript string Optional. Audio transcript

RealtimeOutputAudioContentPart

Audio content part.

Field Type Description
type string Must be "audio"
audio string Base64-encoded audio data
transcript string Optional. Audio transcript

RealtimeRequestImageContentPart

Input image content part. Use it in a user message to attach an image alongside text or audio.

Field Type Description
type string Must be "input_image"
image_url string (URI) Optional. URL of the image. Starting in 2026-06-01-preview, this field is named image_url. Earlier API versions expose the same field as url.
detail string Optional. Image detail level.

Response Objects

RealtimeResponse

Response object representing a model inference response.

Field Type Description
id string Optional. Response ID
object string Optional. Always "realtime.response"
status RealtimeResponseStatus Optional. Response status
status_details RealtimeResponseStatusDetails Optional. Status details
output RealtimeConversationResponseItem[] Optional. Output items
usage RealtimeUsage Optional. Token usage statistics
conversation_id string Optional. Associated conversation ID
voice RealtimeVoice Optional. Voice used for response
modalities string[] Optional. Output modalities used
output_audio_format RealtimeOutputAudioFormat Optional. Audio format used
temperature number Optional. Temperature used
max_response_output_tokens integer or "inf" Optional. Max tokens used

RealtimeResponseStatus

Response status values.

Allowed Values:

  • in_progress - Response is being generated
  • completed - Response completed successfully
  • cancelled - Response was cancelled
  • incomplete - Response incomplete (interrupted)
  • failed - Response failed with error

RealtimeUsage

Token usage statistics.

Field Type Description
total_tokens integer Total tokens used
input_tokens integer Input tokens used
output_tokens integer Output tokens generated
input_token_details TokenDetails Breakdown of input tokens
output_token_details TokenDetails Breakdown of output tokens

TokenDetails

Detailed token usage breakdown.

Field Type Description
cached_tokens integer Optional. Cached tokens used
text_tokens integer Optional. Text tokens used
audio_tokens integer Optional. Audio tokens used
reasoning_tokens integer Optional. Reasoning tokens generated in the output. Applies to output token details only.

Error Handling

RealtimeErrorDetails

Error information object.

Field Type Description
type string Error type (e.g., "invalid_request_error", "server_error")
code string Optional. Specific error code
message string Human-readable error description
param string Optional. Parameter related to the error
event_id string Optional. ID of the client event that caused the error

RealtimeConversationRequestItem

You use the RealtimeConversationRequestItem object to create a new item in the conversation via the conversation.item.create event.

This is a union type that can be one of the following:

RealtimeSystemMessageItem

A system message item.

Field Type Description
type string The type of the item.

Allowed values: message
role string The role of the message.

Allowed values: system
content array of RealtimeInputTextContentPart The content of the message.
id string The unique ID of the item. The client can specify the ID to help manage server-side context. If the client doesn't provide an ID, the server generates one.

RealtimeUserMessageItem

A user message item.

Field Type Description
type string The type of the item.

Allowed values: message
role string The role of the message.

Allowed values: user
content array of RealtimeInputTextContentPart or RealtimeInputAudioContentPart The content of the message.
id string The unique ID of the item. The client can specify the ID to help manage server-side context. If the client doesn't provide an ID, the server generates one.

RealtimeAssistantMessageItem

An assistant message item.

Field Type Description
type string The type of the item.

Allowed values: message
role string The role of the message.

Allowed values: assistant
content array of RealtimeOutputTextContentPart The content of the message.

RealtimeFunctionCallItem

A function call item.

Field Type Description
type string The type of the item.

Allowed values: function_call
name string The name of the function to call.
arguments string The arguments of the function call as a JSON string.
call_id string The ID of the function call item.
id string The unique ID of the item. The client can specify the ID to help manage server-side context. If the client doesn't provide an ID, the server generates one.

RealtimeFunctionCallOutputItem

A function call output item.

Field Type Description
type string The type of the item.

Allowed values: function_call_output
call_id string The ID of the function call item.
output string The output of the function call, this is a free-form string with the function result, also could be empty.
id string The unique ID of the item. If the client doesn't provide an ID, the server generates one.

RealtimeMCPApprovalResponseItem

An MCP approval response item.

Field Type Description
type string The type of the item.

Allowed values: mcp_approval_response
approve boolean Whether the MCP request is approved.
approval_request_id string The ID of the MCP approval request.
id string The unique ID of the item. The client can specify the ID to help manage server-side context. If the client doesn't provide an ID, the server generates one.

RealtimeFunctionTool

The definition of a function tool as used by the realtime endpoint.

Field Type Description
type string The type of the tool.

Allowed values: function
name string The name of the function.
description string The description of the function, including usage guidelines. For example, "Use this function to get the current time."
parameters object The parameters of the function in the form of a JSON object.

RealtimeItemStatus

Allowed Values:

  • in_progress
  • completed
  • incomplete

RealtimeResponseAudioContentPart

Field Type Description
type string The type of the content part.

Allowed values: audio
transcript string The transcript of the audio.

This property is nullable.

RealtimeResponseFunctionCallItem

Field Type Description
type string The type of the item.

Allowed values: function_call
name string The name of the function call item.
call_id string The ID of the function call item.
arguments string The arguments of the function call item.
status RealtimeItemStatus The status of the item.

RealtimeResponseFunctionCallOutputItem

Field Type Description
type string The type of the item.

Allowed values: function_call_output
call_id string The ID of the function call item.
output string The output of the function call item.

RealtimeResponseOptions

Field Type Description
modalities array The output modalities for the response.

Allowed values: text, audio

For example, "modalities": ["text", "audio"] is the default setting that enables both text and audio output modalities. To enable only text output, set "modalities": ["text"]. You can't enable only audio.
instructions string The instructions (the system message) to guide the model's responses.
voice RealtimeVoice The voice used for the model response for the session.

Once the voice is used in the session for the model's audio response, it can't be changed.
tools array of RealtimeTool The tools available to the model for the session.
tool_choice RealtimeToolChoice The tool choice for the session.
temperature number The sampling temperature for the model. The allowed temperature values are limited to [0.6, 1.2]. Defaults to 0.8.
max_response_output_tokens integer or "inf" The maximum number of output tokens per assistant response, inclusive of tool calls.

Specify an integer between 1 and 4096 to limit the output tokens. Otherwise, set the value to "inf" to allow the maximum number of tokens.

For example, to limit the output tokens to 1000, set "max_response_output_tokens": 1000. To allow the maximum number of tokens, set "max_response_output_tokens": "inf".

Defaults to "inf".
interim-response InterimResponseConfig Optional. Configuration for interim response generation during latency or tool calls.
reasoning_effort ReasoningEffort Optional. Constrains effort on reasoning for reasoning models. Check model documentation for supported values for each model. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
conversation string Controls which conversation the response is added to. The supported values are auto and none.

The auto value (or not setting this property) ensures that the contents of the response are added to the session's default conversation.

Set this property to none to create an out-of-band response where items won't be added to the default conversation.

Defaults to "auto"
metadata map Set of up to 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.

For example: metadata: { topic: "classification" }
interim_response InterimResponseConfig Optional. Configuration for interim response generation during latency or tool calls. Overrides the session-level setting for this response.
pre_generated_assistant_message RealtimeAssistantMessageItem Optional. A pre-generated assistant message to use for generating the audio response instead of having the model generate the text. When provided, the server generates an audio response for the predefined text, bypassing model inference for text generation. The message is added to the conversation context history. The message must have the role set to "assistant" and include content with a single text content part.

RealtimeResponseSession

The RealtimeResponseSession object represents a session in the Realtime API. It's used in some of the server events, such as:

Field Type Description
object string The session object.

Allowed values: realtime.session
id string The unique ID of the session.
model string The model used for the session.
modalities array The output modalities for the session.

Allowed values: text, audio

For example, "modalities": ["text", "audio"] is the default setting that enables both text and audio output modalities. To enable only text output, set "modalities": ["text"]. You can't enable only audio.
instructions string The instructions (the system message) to guide the model's text and audio responses.

Here are some example instructions to help guide content and format of text and audio responses:
"instructions": "be succinct"
"instructions": "act friendly"
"instructions": "here are examples of good responses"

Here are some example instructions to help guide audio behavior:
"instructions": "talk quickly"
"instructions": "inject emotion into your voice"
"instructions": "laugh frequently"

While the model might not always follow these instructions, they provide guidance on the desired behavior.
voice RealtimeVoice The voice used for the model response for the session.

Once the voice is used in the session for the model's audio response, it can't be changed.
input_audio_sampling_rate integer The sampling rate for the input audio.
input_audio_format RealtimeAudioFormat The format for the input audio.
output_audio_format RealtimeAudioFormat The format for the output audio.
input_audio_transcription RealtimeAudioInputTranscriptionSettings The settings for audio input transcription.

This property is nullable.
turn_detection RealtimeTurnDetection The turn detection settings for the session.

This property is nullable.
tools array of RealtimeTool The tools available to the model for the session.
tool_choice RealtimeToolChoice The tool choice for the session.
temperature number The sampling temperature for the model. The allowed temperature values are limited to [0.6, 1.2]. Defaults to 0.8.
max_response_output_tokens integer or "inf" The maximum number of output tokens per assistant response, inclusive of tool calls.

Specify an integer between 1 and 4096 to limit the output tokens. Otherwise, set the value to "inf" to allow the maximum number of tokens.

For example, to limit the output tokens to 1000, set "max_response_output_tokens": 1000. To allow the maximum number of tokens, set "max_response_output_tokens": "inf".
interim-response InterimResponseConfig Configuration for interim response generation during latency or tool calls.

RealtimeResponseStatusDetails

Field Type Description
type RealtimeResponseStatus The status of the response.

RealtimeRateLimitsItem

Field Type Description
name string The rate limit property name that this item includes information about.
limit integer The maximum configured limit for this rate limit property.
remaining integer The remaining quota available against the configured limit for this rate limit property.
reset_seconds number The remaining time, in seconds, until this rate limit property is reset.