Share via


@azure/ai-voicelive package

Classes

VoiceLiveAuthenticationError

Authentication error class for Voice Live operations

VoiceLiveClient

The VoiceLive client provides session management for real-time conversational AI capabilities.

This client acts as a factory for creating VoiceLiveSession instances, which handle the actual WebSocket connections and real-time interactions with the service.

VoiceLiveConnectionError

Base error class for Voice Live WebSocket operations

VoiceLiveError

General Voice Live error class

VoiceLiveProtocolError

Protocol error class for Voice Live message operations

VoiceLiveSession

Represents a WebSocket-based session for real-time voice communication with the Azure VoiceLive service.

This class manages the connection, handles real-time communication, and provides access to all interactive features including audio streaming, conversation management, and avatar control.

Interfaces

AgentConfig

Configuration for the agent.

AgentSessionConfig

Configuration for creating a session with an agent as the main AI actor.

When using an agent session, the agent's configuration (tools, instructions, temperature, etc.) is managed in the Foundry portal, not in session code.

Animation

Configuration for animation outputs including blendshapes and visemes metadata.

AssistantMessageItem

An assistant message item within a conversation.

AudioEchoCancellation

Echo cancellation configuration for server-side audio processing.

AudioInputTranscriptionOptions

Configuration for input audio transcription.

AudioNoiseReduction

Configuration for input audio noise reduction.

AudioStreamOptions
AvatarConfig

Configuration for avatar streaming and behavior during the session.

AzureCustomVoice

Azure custom voice configuration.

AzurePersonalVoice

Azure personal voice configuration.

AzureSemanticDetection

Azure semantic end-of-utterance detection (default).

AzureSemanticDetectionEn

Azure semantic end-of-utterance detection (English-optimized).

AzureSemanticDetectionMultilingual

Azure semantic end-of-utterance detection (multilingual).

AzureSemanticVad

Server Speech Detection (Azure semantic VAD, default variant).

AzureSemanticVadEn

Server Speech Detection (Azure semantic VAD, English-only).

AzureSemanticVadMultilingual

Server Speech Detection (Azure semantic VAD).

AzureStandardVoice

Azure standard voice configuration.

AzureVoice

Base for Azure voice configurations.

Background

Defines a video background, either a solid color or an image URL (mutually exclusive).

CachedTokenDetails

Details of output token usage.

ClientEvent

A voicelive client event.

ClientEventConversationItemCreate

Add a new Item to the Conversation's context, including messages, function calls, and function call responses. This event can be used both to populate a "history" of the conversation and to add new items mid-stream, but has the current limitation that it cannot populate assistant audio messages. If successful, the server will respond with a conversation.item.created event, otherwise an error event will be sent.

ClientEventConversationItemDelete

Send this event when you want to remove any item from the conversation history. The server will respond with a conversation.item.deleted event, unless the item does not exist in the conversation history, in which case the server will respond with an error.

ClientEventConversationItemRetrieve

Send this event when you want to retrieve the server's representation of a specific item in the conversation history. This is useful, for example, to inspect user audio after noise cancellation and VAD. The server will respond with a conversation.item.retrieved event, unless the item does not exist in the conversation history, in which case the server will respond with an error.

ClientEventConversationItemTruncate

Send this event to truncate a previous assistant message’s audio. The server will produce audio faster than voicelive, so this event is useful when the user interrupts to truncate audio that has already been sent to the client but not yet played. This will synchronize the server's understanding of the audio with the client's playback. Truncating audio will delete the server-side text transcript to ensure there is not text in the context that hasn't been heard by the user. If successful, the server will respond with a conversation.item.truncated event.

ClientEventInputAudioBufferAppend

Send this event to append audio bytes to the input audio buffer. The audio buffer is temporary storage you can write to and later commit. In Server VAD mode, the audio buffer is used to detect speech and the server will decide when to commit. When Server VAD is disabled, you must commit the audio buffer manually.

The client may choose how much audio to place in each event up to a maximum of 15 MiB, for example streaming smaller chunks from the client may allow the VAD to be more responsive. Unlike made other client events, the server will not send a confirmation response to this event.

ClientEventInputAudioBufferClear

Send this event to clear the audio bytes in the buffer. The server will respond with an input_audio_buffer.cleared event.

ClientEventInputAudioBufferCommit

Send this event to commit the user input audio buffer, which will create a new user message item in the conversation. This event will produce an error if the input audio buffer is empty. When in Server VAD mode, the client does not need to send this event, the server will commit the audio buffer automatically. Committing the input audio buffer will trigger input audio transcription (if enabled in session configuration), but it will not create a response from the model. The server will respond with an input_audio_buffer.committed event.

ClientEventInputAudioClear

Clears all input audio currently being streamed.

ClientEventInputAudioTurnAppend

Appends audio data to an ongoing input turn.

ClientEventInputAudioTurnCancel

Cancels an in-progress input audio turn.

ClientEventInputAudioTurnEnd

Marks the end of an audio input turn.

ClientEventInputAudioTurnStart

Indicates the start of a new audio input turn.

ClientEventResponseCancel

Send this event to cancel an in-progress response. The server will respond with a response.cancelled event or an error if there is no response to cancel.

ClientEventResponseCreate

This event instructs the server to create a Response, which means triggering model inference. When in Server VAD mode, the server will create Responses automatically. A Response will include at least one Item, and may have two, in which case the second will be a function call. These Items will be appended to the conversation history. The server will respond with a response.created event, events for Items and content created, and finally a response.done event to indicate the Response is complete. The response.create event includes inference configuration like instructions, and temperature. These fields will override the Session's configuration for this Response only.

ClientEventSessionAvatarConnect

Sent when the client connects and provides its SDP (Session Description Protocol)

for avatar-related media negotiation.

ClientEventSessionUpdate

Send this event to update the session’s default configuration. The client may send this event at any time to update any field, except for voice. However, note that once a session has been initialized with a particular model, it can’t be changed to another model using session.update. When the server receives a session.update, it will respond with a session.updated event showing the full, effective configuration. Only the fields that are present are updated. To clear a field like instructions, pass an empty string.

ConnectOptions
ConnectedEventArgs

Arguments provided when a connection is established

ConnectionContext

Context information provided to connection-related handlers

ContentPart

Base for any content part; discriminated by type.

ConversationItemBase

The item to add to the conversation.

ConversationRequestItem

Base for any response item; discriminated by type.

CreateSessionOptions
DisconnectedEventArgs

Arguments provided when a connection is lost

EouDetection

Top-level union for end-of-utterance (EOU) semantic detection configuration.

ErrorEventArgs

Arguments provided when an error occurs

ErrorResponse

Standard error response envelope.

FunctionCallItem

A function call item within a conversation.

FunctionCallOutputItem

A function call output item within a conversation.

FunctionTool

The definition of a function tool as used by the voicelive endpoint.

IceServer

ICE server configuration for WebRTC connection negotiation.

InputAudioContentPart

Input audio content part.

InputTextContentPart

Input text content part.

InputTokenDetails

Details of input token usage.

InterimResponseConfigBase

Base model for interim response configuration.

LlmInterimResponseConfig

Configuration for LLM-based interim response generation. Uses LLM to generate context-aware interim responses when any trigger condition is met.

LogProbProperties

A single log probability entry for a token.

MCPApprovalResponseRequestItem

A request item that represents a response to an MCP approval request.

MCPServer

The definition of an MCP server as used by the voicelive endpoint.

MCPTool

Represents a mcp tool definition.

MessageContentPart

Base for any message content part; discriminated by type.

MessageItem

A message item within a conversation.

OpenAIVoice

OpenAI voice configuration with explicit type field.

This provides a unified interface for OpenAI voices, complementing the existing string-based OAIVoice for backward compatibility.

OutputTextContentPart

Output text content part.

OutputTokenDetails

Details of output token usage.

RequestAudioContentPart

An audio content part for a request. This is supported only by realtime models (e.g., gpt-realtime). For text-based models, use input_text instead.

RequestImageContentPart

Input image content part.

RequestSession

Base for session configuration shared between request and response.

RequestTextContentPart

A text content part for a request.

Response

The response resource.

ResponseAudioContentPart

An audio content part for a response.

ResponseCancelledDetails

Details for a cancelled response.

ResponseCreateParams

Create a new VoiceLive response with these parameters

ResponseFailedDetails

Details for a failed response.

ResponseFunctionCallItem

A function call item within a conversation.

ResponseFunctionCallOutputItem

A function call output item within a conversation.

ResponseIncompleteDetails

Details for an incomplete response.

ResponseItem

Base for any response item; discriminated by type.

ResponseMCPApprovalRequestItem

A response item that represents a request for approval to call an MCP tool.

ResponseMCPApprovalResponseItem

A response item that represents a response to an MCP approval request.

ResponseMCPCallItem

A response item that represents a call to an MCP tool.

ResponseMCPListToolItem

A response item that lists the tools available on an MCP server.

ResponseMessageItem

Base type for message item within a conversation.

ResponseSession

Base for session configuration in the response.

ResponseStatusDetails

Base for all non-success response details.

ResponseTextContentPart

A text content part for a response.

SendEventOptions
ServerEvent

A voicelive server event.

ServerEventConversationItemCreated

Returned when a conversation item is created. There are several scenarios that produce this event:

  • The server is generating a Response, which if successful will produce either one or two Items, which will be of type message (role assistant) or type function_call.
  • The input audio buffer has been committed, either by the client or the server (in server_vad mode). The server will take the content of the input audio buffer and add it to a new user message Item.
  • The client has sent a conversation.item.create event to add a new Item to the Conversation.
ServerEventConversationItemDeleted

Returned when an item in the conversation is deleted by the client with a conversation.item.delete event. This event is used to synchronize the server's understanding of the conversation history with the client's view.

ServerEventConversationItemInputAudioTranscriptionCompleted

This event is the output of audio transcription for user audio written to the user audio buffer. Transcription begins when the input audio buffer is committed by the client or server (in server_vad mode). Transcription runs asynchronously with Response creation, so this event may come before or after the Response events. VoiceLive API models accept audio natively, and thus input transcription is a separate process run on a separate ASR (Automatic Speech Recognition) model. The transcript may diverge somewhat from the model's interpretation, and should be treated as a rough guide.

ServerEventConversationItemInputAudioTranscriptionDelta

Returned when the text value of an input audio transcription content part is updated.

ServerEventConversationItemInputAudioTranscriptionFailed

Returned when input audio transcription is configured, and a transcription request for a user message failed. These events are separate from other error events so that the client can identify the related Item.

ServerEventConversationItemRetrieved

Returned when a conversation item is retrieved with conversation.item.retrieve.

ServerEventConversationItemTruncated

Returned when an earlier assistant audio message item is truncated by the client with a conversation.item.truncate event. This event is used to synchronize the server's understanding of the audio with the client's playback. This action will truncate the audio and remove the server-side text transcript to ensure there is no text in the context that hasn't been heard by the user.

ServerEventError

Returned when an error occurs, which could be a client problem or a server problem. Most errors are recoverable and the session will stay open, we recommend to implementors to monitor and log error messages by default.

ServerEventErrorDetails

Details of the error.

ServerEventInputAudioBufferCleared

Returned when the input audio buffer is cleared by the client with a input_audio_buffer.clear event.

ServerEventInputAudioBufferCommitted

Returned when an input audio buffer is committed, either by the client or automatically in server VAD mode. The item_id property is the ID of the user message item that will be created, thus a conversation.item.created event will also be sent to the client.

ServerEventInputAudioBufferSpeechStarted

Sent by the server when in server_vad mode to indicate that speech has been detected in the audio buffer. This can happen any time audio is added to the buffer (unless speech is already detected). The client may want to use this event to interrupt audio playback or provide visual feedback to the user. The client should expect to receive a input_audio_buffer.speech_stopped event when speech stops. The item_id property is the ID of the user message item that will be created when speech stops and will also be included in the input_audio_buffer.speech_stopped event (unless the client manually commits the audio buffer during VAD activation).

ServerEventInputAudioBufferSpeechStopped

Returned in server_vad mode when the server detects the end of speech in the audio buffer. The server will also send an conversation.item.created event with the user message item that is created from the audio buffer.

ServerEventMcpListToolsCompleted

MCP list tools completed message.

ServerEventMcpListToolsFailed

MCP list tools failed message.

ServerEventMcpListToolsInProgress

MCP list tools in progress message.

ServerEventResponseAnimationBlendshapeDelta

Represents a delta update of blendshape animation frames for a specific output of a response.

ServerEventResponseAnimationBlendshapeDone

Indicates the completion of blendshape animation processing for a specific output of a response.

ServerEventResponseAnimationVisemeDelta

Represents a viseme ID delta update for animation based on audio.

ServerEventResponseAnimationVisemeDone

Indicates completion of viseme animation delivery for a response.

ServerEventResponseAudioDelta

Returned when the model-generated audio is updated.

ServerEventResponseAudioDone

Returned when the model-generated audio is done. Also emitted when a Response is interrupted, incomplete, or cancelled.

ServerEventResponseAudioTimestampDelta

Represents a word-level audio timestamp delta for a response.

ServerEventResponseAudioTimestampDone

Indicates completion of audio timestamp delivery for a response.

ServerEventResponseAudioTranscriptDelta

Returned when the model-generated transcription of audio output is updated.

ServerEventResponseAudioTranscriptDone

Returned when the model-generated transcription of audio output is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.

ServerEventResponseContentPartAdded

Returned when a new content part is added to an assistant message item during response generation.

ServerEventResponseContentPartDone

Returned when a content part is done streaming in an assistant message item. Also emitted when a Response is interrupted, incomplete, or cancelled.

ServerEventResponseCreated

Returned when a new Response is created. The first event of response creation, where the response is in an initial state of in_progress.

ServerEventResponseDone

Returned when a Response is done streaming. Always emitted, no matter the final state. The Response object included in the response.done event will include all output Items in the Response but will omit the raw audio data.

ServerEventResponseFunctionCallArgumentsDelta

Returned when the model-generated function call arguments are updated.

ServerEventResponseFunctionCallArgumentsDone

Returned when the model-generated function call arguments are done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.

ServerEventResponseMcpCallArgumentsDelta

Represents a delta update of the arguments for an MCP tool call.

ServerEventResponseMcpCallArgumentsDone

Indicates the completion of the arguments for an MCP tool call.

ServerEventResponseMcpCallCompleted

Indicates the MCP call has completed.

ServerEventResponseMcpCallFailed

Indicates the MCP call has failed.

ServerEventResponseMcpCallInProgress

Indicates the MCP call running.

ServerEventResponseOutputItemAdded

Returned when a new Item is created during Response generation.

ServerEventResponseOutputItemDone

Returned when an Item is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.

ServerEventResponseTextDelta

Returned when the text value of a "text" content part is updated.

ServerEventResponseTextDone

Returned when the text value of a "text" content part is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.

ServerEventSessionAvatarConnecting

Sent when the server is in the process of establishing an avatar media connection and provides its SDP answer.

ServerEventSessionCreated

Returned when a Session is created. Emitted automatically when a new connection is established as the first server event. This event will contain the default Session configuration.

ServerEventSessionUpdated

Returned when a session is updated with a session.update event, unless there is an error.

ServerVad

Base model for VAD-based turn detection.

SessionBase

VoiceLive session object configuration.

SessionContext

Context information provided to session-related handlers

StartSessionOptions
StaticInterimResponseConfig

Configuration for static interim response generation. Randomly selects from configured texts when any trigger condition is met.

SystemMessageItem

A system message item within a conversation.

TokenUsage

Overall usage statistics for a response.

Tool

The base representation of a voicelive tool definition.

ToolChoiceFunctionSelection

The representation of a voicelive tool_choice selecting a named function tool.

ToolChoiceSelection

A base representation for a voicelive tool_choice selecting a named tool.

TurnDetection

Top-level union for turn detection configuration.

TurnOptions
UserMessageItem

A user message item within a conversation.

VideoCrop

Defines a video crop rectangle using top-left and bottom-right coordinates.

VideoParams

Video streaming parameters for avatar.

VideoResolution

Resolution of the video feed in pixels.

VoiceLiveClientOptions
VoiceLiveErrorDetails

Error object returned in case of API failure.

VoiceLiveSessionHandlers

Handler functions for VoiceLive session events following Azure SDK patterns.

ALL handlers are optional - implement only the events you care about! Each handler receives strongly-typed event data and context information.

VoiceLiveSessionOptions
VoiceLiveSubscription

Represents an active subscription to VoiceLive session events

Type Aliases

AnimationOutputType

Specifies the types of animation data to output.
KnownAnimationOutputType can be used interchangeably with AnimationOutputType, this enum contains the known values that the service supports.

Known values supported by the service

blendshapes: Blendshapes output type.
viseme_id: Viseme ID output type.

AudioTimestampType

Output timestamp types supported in audio response content.
KnownAudioTimestampType can be used interchangeably with AudioTimestampType, this enum contains the known values that the service supports.

Known values supported by the service

word: Timestamps per word in the output audio.

AvatarConfigTypes

Avatar config types
KnownAvatarConfigTypes can be used interchangeably with AvatarConfigTypes, this enum contains the known values that the service supports.

Known values supported by the service

video-avatar: Video avatar
photo-avatar: Photo avatar

AvatarOutputProtocol

Avatar config output protocols
KnownAvatarOutputProtocol can be used interchangeably with AvatarOutputProtocol, this enum contains the known values that the service supports.

Known values supported by the service

webrtc: WebRTC protocol, output the audio/video streams via WebRTC
websocket: WebSocket protocol, output the video frames over WebSocket

AzureVoiceType

Union of all supported Azure voice types.
KnownAzureVoiceType can be used interchangeably with AzureVoiceType, this enum contains the known values that the service supports.

Known values supported by the service

azure-custom: Azure custom voice.
azure-standard: Azure standard voice.
azure-personal: Azure personal voice.

AzureVoiceUnion

Alias for AzureVoiceUnion

ClientEventType

Client event types used in VoiceLive protocol.
KnownClientEventType can be used interchangeably with ClientEventType, this enum contains the known values that the service supports.

Known values supported by the service

session.update
input_audio_buffer.append
input_audio_buffer.commit
input_audio_buffer.clear
input_audio.turn.start
input_audio.turn.append
input_audio.turn.end
input_audio.turn.cancel
input_audio.clear
conversation.item.create
conversation.item.retrieve
conversation.item.truncate
conversation.item.delete
response.create
response.cancel
session.avatar.connect
mcp_approval_response

ClientEventUnion

Alias for ClientEventUnion

ContentPartType

Type of ContentPartType

ContentPartUnion

Alias for ContentPartUnion

ConversationRequestItemUnion

Alias for ConversationRequestItemUnion

EouDetectionUnion

Alias for EouDetectionUnion

EouThresholdLevel

Threshold level settings for Azure semantic end-of-utterance detection.
KnownEouThresholdLevel can be used interchangeably with EouThresholdLevel, this enum contains the known values that the service supports.

Known values supported by the service

low: Low sensitivity threshold level.
medium: Medium sensitivity threshold level.
high: High sensitivity threshold level.
default: Default sensitivity threshold level.

InputAudioFormat

Input audio format types supported.
KnownInputAudioFormat can be used interchangeably with InputAudioFormat, this enum contains the known values that the service supports.

Known values supported by the service

pcm16: 16-bit PCM audio format at default sampling rate (24kHz)
g711_ulaw: G.711 μ-law (mu-law) audio format at 8kHz sampling rate
g711_alaw: G.711 A-law audio format at 8kHz sampling rate

InterimResponseConfig

Union of interim response configuration types.

InterimResponseConfigBaseUnion

Alias for InterimResponseConfigBaseUnion

InterimResponseConfigType

Interim response configuration types.
KnownInterimResponseConfigType can be used interchangeably with InterimResponseConfigType, this enum contains the known values that the service supports.

Known values supported by the service

static_interim_response: Static interim response configuration type.
llm_interim_response: LLM-based interim response configuration type.

InterimResponseTrigger

Triggers that can activate interim response generation.
KnownInterimResponseTrigger can be used interchangeably with InterimResponseTrigger, this enum contains the known values that the service supports.

Known values supported by the service

latency: Trigger interim response when response latency exceeds threshold.
tool: Trigger interim response when a tool call is being executed.

ItemParamStatus

Indicates the processing status of an item or parameter.
KnownItemParamStatus can be used interchangeably with ItemParamStatus, this enum contains the known values that the service supports.

Known values supported by the service

completed: Item or parameter is still being processed.
incomplete: Item or parameter is not yet complete.

ItemType

Type of ItemType

MCPApprovalType

The available set of MCP approval types.
KnownMCPApprovalType can be used interchangeably with MCPApprovalType, this enum contains the known values that the service supports.

Known values supported by the service

never: Approval is never required.
always: Approval is always required.

MessageContentPartUnion

Alias for MessageContentPartUnion

MessageItemUnion

Alias for MessageItemUnion

MessageRole

Type of MessageRole

Modality

Supported modalities for the session.
KnownModality can be used interchangeably with Modality, this enum contains the known values that the service supports.

Known values supported by the service

text: Text modality.
audio: Audio modality.
animation: Animation modality.
avatar: Avatar modality.

OAIVoice

Supported OpenAI voice names (string enum).
KnownOAIVoice can be used interchangeably with OAIVoice, this enum contains the known values that the service supports.

Known values supported by the service

alloy: Alloy voice.
ash: Ash voice.
ballad: Ballard voice.
coral: Coral voice.
echo: Echo voice.
sage: Sage voice.
shimmer: Shimmer voice.
verse: Verse voice.
marin: Marin voice.
cedar: Cedar voice.

OutputAudioFormat

Output audio format types supported.
KnownOutputAudioFormat can be used interchangeably with OutputAudioFormat, this enum contains the known values that the service supports.

Known values supported by the service

pcm16: 16-bit PCM audio format at default sampling rate (24kHz)
pcm16_8000hz: 16-bit PCM audio format at 8kHz sampling rate
pcm16_16000hz: 16-bit PCM audio format at 16kHz sampling rate
g711_ulaw: G.711 μ-law (mu-law) audio format at 8kHz sampling rate
g711_alaw: G.711 A-law audio format at 8kHz sampling rate

PersonalVoiceModels

PersonalVoice models
KnownPersonalVoiceModels can be used interchangeably with PersonalVoiceModels, this enum contains the known values that the service supports.

Known values supported by the service

DragonLatestNeural: Use the latest Dragon model.
PhoenixLatestNeural: Use the latest Phoenix model.
PhoenixV2Neural: Use the Phoenix V2 model.

PhotoAvatarBaseModes

Photo avatar base modes
KnownPhotoAvatarBaseModes can be used interchangeably with PhotoAvatarBaseModes, this enum contains the known values that the service supports.

Known values supported by the service

vasa-1: VASA-1 model

ReasoningEffort

Constrains effort on reasoning for reasoning models. Check model documentation for supported values for each model. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
KnownReasoningEffort can be used interchangeably with ReasoningEffort, this enum contains the known values that the service supports.

Known values supported by the service

none: No reasoning effort.
minimal: Minimal reasoning effort.
low: Low reasoning effort - faster responses with less reasoning.
medium: Medium reasoning effort - balanced between speed and reasoning depth.
high: High reasoning effort - more thorough reasoning, may take longer.
xhigh: Extra high reasoning effort - maximum reasoning depth.

RequestImageContentPartDetail

Specifies an image's detail level. Can be 'auto', 'low', 'high', or an unknown future value.
KnownRequestImageContentPartDetail can be used interchangeably with RequestImageContentPartDetail, this enum contains the known values that the service supports.

Known values supported by the service

auto: Automatically select an appropriate detail level.
low: Use a lower detail level to reduce bandwidth or cost.
high: Use a higher detail level—potentially more resource-intensive.

ResponseItemStatus

Indicates the processing status of a response item.
KnownResponseItemStatus can be used interchangeably with ResponseItemStatus, this enum contains the known values that the service supports.

Known values supported by the service

in_progress: Item that is in progress.
completed: Item has been fully processed and is complete.
incomplete: Item has been processed but is incomplete.

ResponseItemUnion

Alias for ResponseItemUnion

ResponseStatus

Terminal status of a response.
KnownResponseStatus can be used interchangeably with ResponseStatus, this enum contains the known values that the service supports.

Known values supported by the service

completed
cancelled
failed
incomplete
in_progress

ResponseStatusDetailsUnion

Alias for ResponseStatusDetailsUnion

ServerEventType

Server event types used in VoiceLive protocol.
KnownServerEventType can be used interchangeably with ServerEventType, this enum contains the known values that the service supports.

Known values supported by the service

error
session.avatar.connecting
session.created
session.updated
conversation.item.input_audio_transcription.completed
conversation.item.input_audio_transcription.delta
conversation.item.input_audio_transcription.failed
conversation.item.created
conversation.item.retrieved
conversation.item.truncated
conversation.item.deleted
input_audio_buffer.committed
input_audio_buffer.cleared
input_audio_buffer.speech_started
input_audio_buffer.speech_stopped
response.created
response.done
response.output_item.added
response.output_item.done
response.content_part.added
response.content_part.done
response.text.delta
response.text.done
response.audio_transcript.delta
response.audio_transcript.done
response.audio.delta
response.audio.done
response.animation_blendshapes.delta
response.animation_blendshapes.done
response.audio_timestamp.delta
response.audio_timestamp.done
response.animation_viseme.delta
response.animation_viseme.done
response.function_call_arguments.delta
response.function_call_arguments.done
mcp_list_tools.in_progress
mcp_list_tools.completed
mcp_list_tools.failed
response.mcp_call_arguments.delta
response.mcp_call_arguments.done
response.mcp_call.in_progress
response.mcp_call.completed
response.mcp_call.failed

ServerEventUnion

Alias for ServerEventUnion

SessionTarget

Target for a Voice Live session, specifying either a model or an agent.

Use { model: string } for model-centric sessions where the LLM is the main actor. Use { agent: AgentSessionConfig } for agent-centric sessions where the agent is the main actor.

Example

Model-centric session

import { DefaultAzureCredential } from "@azure/identity";
import { VoiceLiveClient } from "@azure/ai-voicelive";

const credential = new DefaultAzureCredential();
const endpoint = "https://your-resource.cognitiveservices.azure.com";
const client = new VoiceLiveClient(endpoint, credential);

const session = client.createSession({ model: "gpt-4o-realtime-preview" });

Example

Agent-centric session

import { DefaultAzureCredential } from "@azure/identity";
import { VoiceLiveClient } from "@azure/ai-voicelive";

const credential = new DefaultAzureCredential();
const endpoint = "https://your-resource.cognitiveservices.azure.com";
const client = new VoiceLiveClient(endpoint, credential);

const session = client.createSession({
  agent: { agentName: "my-agent", projectName: "my-project" },
});
ToolChoice

The combined set of available representations for a voicelive tool_choice parameter, encompassing both string literal options like 'auto' as well as structured references to defined tools.

ToolChoiceLiteral

The available set of mode-level, string literal tool_choice options for the voicelive endpoint.
KnownToolChoiceLiteral can be used interchangeably with ToolChoiceLiteral, this enum contains the known values that the service supports.

Known values supported by the service

auto: Specifies that the model should freely determine which tool or tools, if any, to call.
none: Specifies that the model should call no tools whatsoever.
required: Specifies that the model should call at least one tool.

ToolChoiceSelectionUnion

Alias for ToolChoiceSelectionUnion

ToolType

The supported tool type discriminators for voicelive tools. Currently, only 'function' tools are supported.
KnownToolType can be used interchangeably with ToolType, this enum contains the known values that the service supports.

Known values supported by the service

function
mcp

ToolUnion

Alias for ToolUnion

TurnDetectionType

Type of TurnDetectionType

TurnDetectionUnion

Alias for TurnDetectionUnion

Voice

Union of all supported voice configurations.

Enums

ConnectionState

Connection state enumeration for lifecycle management

KnownAnimationOutputType

Specifies the types of animation data to output.

KnownAudioTimestampType

Output timestamp types supported in audio response content.

KnownAvatarConfigTypes

Avatar config types

KnownAvatarOutputProtocol

Avatar config output protocols

KnownAzureVoiceType

Union of all supported Azure voice types.

KnownClientEventType

Client event types used in VoiceLive protocol.

KnownContentPartType

Known values of ContentPartType that the service accepts.

KnownEouThresholdLevel

Threshold level settings for Azure semantic end-of-utterance detection.

KnownInputAudioFormat

Input audio format types supported.

KnownInterimResponseConfigType

Interim response configuration types.

KnownInterimResponseTrigger

Triggers that can activate interim response generation.

KnownItemParamStatus

Indicates the processing status of an item or parameter.

KnownItemType

Known values of ItemType that the service accepts.

KnownMCPApprovalType

The available set of MCP approval types.

KnownMessageRole

Known values of MessageRole that the service accepts.

KnownModality

Supported modalities for the session.

KnownOAIVoice

Supported OpenAI voice names (string enum).

KnownOutputAudioFormat

Output audio format types supported.

KnownPersonalVoiceModels

PersonalVoice models

KnownPhotoAvatarBaseModes

Photo avatar base modes

KnownReasoningEffort

Constrains effort on reasoning for reasoning models. Check model documentation for supported values for each model. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.

KnownRequestImageContentPartDetail

Specifies an image's detail level. Can be 'auto', 'low', 'high', or an unknown future value.

KnownResponseItemStatus

Indicates the processing status of a response item.

KnownResponseStatus

Terminal status of a response.

KnownServerEventType

Server event types used in VoiceLive protocol.

KnownToolChoiceLiteral

The available set of mode-level, string literal tool_choice options for the voicelive endpoint.

KnownToolType

The supported tool type discriminators for voicelive tools. Currently, only 'function' tools are supported.

KnownTurnDetectionType

Known values of TurnDetectionType that the service accepts.

VoiceLiveErrorCodes

Error codes for Voice Live WebSocket operations

Functions

classifyConnectionError(unknown)

Classifies connection errors

classifyProtocolError(Error, string)

Classifies protocol errors

isAgentSessionTarget(SessionTarget)

Type guard to check if a SessionTarget specifies an agent session.

isModelSessionTarget(SessionTarget)

Type guard to check if a SessionTarget specifies a model session.

Function Details

classifyConnectionError(unknown)

Classifies connection errors

function classifyConnectionError(error: unknown): VoiceLiveConnectionError

Parameters

error

unknown

Returns

classifyProtocolError(Error, string)

Classifies protocol errors

function classifyProtocolError(error: Error, messageType: string): VoiceLiveProtocolError

Parameters

error

Error

messageType

string

Returns

isAgentSessionTarget(SessionTarget)

Type guard to check if a SessionTarget specifies an agent session.

function isAgentSessionTarget(target: SessionTarget): target

Parameters

target
SessionTarget

The session target to check

Returns

target

True if the target specifies an agent session

isModelSessionTarget(SessionTarget)

Type guard to check if a SessionTarget specifies a model session.

function isModelSessionTarget(target: SessionTarget): target

Parameters

target
SessionTarget

The session target to check

Returns

target

True if the target specifies a model session