@azure/ai-voicelive package

Classes

VoiceLiveAuthenticationError	Authentication error class for Voice Live operations
VoiceLiveClient	The VoiceLive client provides session management for real-time conversational AI capabilities. This client acts as a factory for creating VoiceLiveSession instances, which handle the actual WebSocket connections and real-time interactions with the service.
VoiceLiveConnectionError	Base error class for Voice Live WebSocket operations
VoiceLiveError	General Voice Live error class
VoiceLiveProtocolError	Protocol error class for Voice Live message operations
VoiceLiveSession	Represents a WebSocket-based session for real-time voice communication with the Azure VoiceLive service. This class manages the connection, handles real-time communication, and provides access to all interactive features including audio streaming, conversation management, and avatar control.

Interfaces

AgentConfig	Configuration for the agent.
AgentSessionConfig	Configuration for creating a session with an agent as the main AI actor. When using an agent session, the agent's configuration (tools, instructions, temperature, etc.) is managed in the Foundry portal, not in session code.
Animation	Configuration for animation outputs including blendshapes and visemes metadata.
AssistantMessageItem	An assistant message item within a conversation.
AudioEchoCancellation	Echo cancellation configuration for server-side audio processing.
AudioInputTranscriptionOptions	Configuration for input audio transcription.
AudioNoiseReduction	Configuration for input audio noise reduction.
AudioStreamOptions
AvatarConfig	Configuration for avatar streaming and behavior during the session.
AzureCustomVoice	Azure custom voice configuration.
AzurePersonalVoice	Azure personal voice configuration.
AzureSemanticDetection	Azure semantic end-of-utterance detection (default).
AzureSemanticDetectionEn	Azure semantic end-of-utterance detection (English-optimized).
AzureSemanticDetectionMultilingual	Azure semantic end-of-utterance detection (multilingual).
AzureSemanticVad	Server Speech Detection (Azure semantic VAD, default variant).
AzureSemanticVadEn	Server Speech Detection (Azure semantic VAD, English-only).
AzureSemanticVadMultilingual	Server Speech Detection (Azure semantic VAD).
AzureStandardVoice	Azure standard voice configuration.
AzureVoice	Base for Azure voice configurations.
Background	Defines a video background, either a solid color or an image URL (mutually exclusive).
CachedTokenDetails	Details of output token usage.
ClientEvent	A voicelive client event.
ClientEventConversationItemCreate	Add a new Item to the Conversation's context, including messages, function calls, and function call responses. This event can be used both to populate a "history" of the conversation and to add new items mid-stream, but has the current limitation that it cannot populate assistant audio messages. If successful, the server will respond with a `conversation.item.created` event, otherwise an `error` event will be sent.
ClientEventConversationItemDelete	Send this event when you want to remove any item from the conversation history. The server will respond with a `conversation.item.deleted` event, unless the item does not exist in the conversation history, in which case the server will respond with an error.
ClientEventConversationItemRetrieve	Send this event when you want to retrieve the server's representation of a specific item in the conversation history. This is useful, for example, to inspect user audio after noise cancellation and VAD. The server will respond with a `conversation.item.retrieved` event, unless the item does not exist in the conversation history, in which case the server will respond with an error.
ClientEventConversationItemTruncate	Send this event to truncate a previous assistant message’s audio. The server will produce audio faster than voicelive, so this event is useful when the user interrupts to truncate audio that has already been sent to the client but not yet played. This will synchronize the server's understanding of the audio with the client's playback. Truncating audio will delete the server-side text transcript to ensure there is not text in the context that hasn't been heard by the user. If successful, the server will respond with a `conversation.item.truncated` event.
ClientEventInputAudioBufferAppend	Send this event to append audio bytes to the input audio buffer. The audio buffer is temporary storage you can write to and later commit. In Server VAD mode, the audio buffer is used to detect speech and the server will decide when to commit. When Server VAD is disabled, you must commit the audio buffer manually. The client may choose how much audio to place in each event up to a maximum of 15 MiB, for example streaming smaller chunks from the client may allow the VAD to be more responsive. Unlike made other client events, the server will not send a confirmation response to this event.
ClientEventInputAudioBufferClear	Send this event to clear the audio bytes in the buffer. The server will respond with an `input_audio_buffer.cleared` event.
ClientEventInputAudioBufferCommit	Send this event to commit the user input audio buffer, which will create a new user message item in the conversation. This event will produce an error if the input audio buffer is empty. When in Server VAD mode, the client does not need to send this event, the server will commit the audio buffer automatically. Committing the input audio buffer will trigger input audio transcription (if enabled in session configuration), but it will not create a response from the model. The server will respond with an `input_audio_buffer.committed` event.
ClientEventInputAudioClear	Clears all input audio currently being streamed.
ClientEventInputAudioTurnAppend	Appends audio data to an ongoing input turn.
ClientEventInputAudioTurnCancel	Cancels an in-progress input audio turn.
ClientEventInputAudioTurnEnd	Marks the end of an audio input turn.
ClientEventInputAudioTurnStart	Indicates the start of a new audio input turn.
ClientEventResponseCancel	Send this event to cancel an in-progress response. The server will respond with a `response.cancelled` event or an error if there is no response to cancel.
ClientEventResponseCreate	This event instructs the server to create a Response, which means triggering model inference. When in Server VAD mode, the server will create Responses automatically. A Response will include at least one Item, and may have two, in which case the second will be a function call. These Items will be appended to the conversation history. The server will respond with a `response.created` event, events for Items and content created, and finally a `response.done` event to indicate the Response is complete. The `response.create` event includes inference configuration like `instructions`, and `temperature`. These fields will override the Session's configuration for this Response only.
ClientEventSessionAvatarConnect	Sent when the client connects and provides its SDP (Session Description Protocol) for avatar-related media negotiation.
ClientEventSessionUpdate	Send this event to update the session’s default configuration. The client may send this event at any time to update any field, except for `voice`. However, note that once a session has been initialized with a particular `model`, it can’t be changed to another model using `session.update`. When the server receives a `session.update`, it will respond with a `session.updated` event showing the full, effective configuration. Only the fields that are present are updated. To clear a field like `instructions`, pass an empty string.
ConnectOptions
ConnectedEventArgs	Arguments provided when a connection is established
ConnectionContext	Context information provided to connection-related handlers
ContentPart	Base for any content part; discriminated by `type`.
ConversationItemBase	The item to add to the conversation.
ConversationRequestItem	Base for any response item; discriminated by `type`.
CreateSessionOptions
DisconnectedEventArgs	Arguments provided when a connection is lost
EouDetection	Top-level union for end-of-utterance (EOU) semantic detection configuration.
ErrorEventArgs	Arguments provided when an error occurs
ErrorResponse	Standard error response envelope.
FunctionCallItem	A function call item within a conversation.
FunctionCallOutputItem	A function call output item within a conversation.
FunctionTool	The definition of a function tool as used by the voicelive endpoint.
IceServer	ICE server configuration for WebRTC connection negotiation.
InputAudioContentPart	Input audio content part.
InputTextContentPart	Input text content part.
InputTokenDetails	Details of input token usage.
InterimResponseConfigBase	Base model for interim response configuration.
LlmInterimResponseConfig	Configuration for LLM-based interim response generation. Uses LLM to generate context-aware interim responses when any trigger condition is met.
LogProbProperties	A single log probability entry for a token.
MCPApprovalResponseRequestItem	A request item that represents a response to an MCP approval request.
MCPServer	The definition of an MCP server as used by the voicelive endpoint.
MCPTool	Represents a mcp tool definition.
MessageContentPart	Base for any message content part; discriminated by `type`.
MessageItem	A message item within a conversation.
OpenAIVoice	OpenAI voice configuration with explicit type field. This provides a unified interface for OpenAI voices, complementing the existing string-based OAIVoice for backward compatibility.
OutputTextContentPart	Output text content part.
OutputTokenDetails	Details of output token usage.
RequestAudioContentPart	An audio content part for a request. This is supported only by realtime models (e.g., gpt-realtime). For text-based models, use `input_text` instead.
RequestImageContentPart	Input image content part.
RequestSession	Base for session configuration shared between request and response.
RequestTextContentPart	A text content part for a request.
Response	The response resource.
ResponseAudioContentPart	An audio content part for a response.
ResponseCancelledDetails	Details for a cancelled response.
ResponseCreateParams	Create a new VoiceLive response with these parameters
ResponseFailedDetails	Details for a failed response.
ResponseFunctionCallItem	A function call item within a conversation.
ResponseFunctionCallOutputItem	A function call output item within a conversation.
ResponseIncompleteDetails	Details for an incomplete response.
ResponseItem	Base for any response item; discriminated by `type`.
ResponseMCPApprovalRequestItem	A response item that represents a request for approval to call an MCP tool.
ResponseMCPApprovalResponseItem	A response item that represents a response to an MCP approval request.
ResponseMCPCallItem	A response item that represents a call to an MCP tool.
ResponseMCPListToolItem	A response item that lists the tools available on an MCP server.
ResponseMessageItem	Base type for message item within a conversation.
ResponseSession	Base for session configuration in the response.
ResponseStatusDetails	Base for all non-success response details.
ResponseTextContentPart	A text content part for a response.
SendEventOptions
ServerEvent	A voicelive server event.
ServerEventConversationItemCreated	Returned when a conversation item is created. There are several scenarios that produce this event: The server is generating a Response, which if successful will produce either one or two Items, which will be of type `message` (role `assistant`) or type `function_call`. The input audio buffer has been committed, either by the client or the server (in `server_vad` mode). The server will take the content of the input audio buffer and add it to a new user message Item. The client has sent a `conversation.item.create` event to add a new Item to the Conversation.
ServerEventConversationItemDeleted	Returned when an item in the conversation is deleted by the client with a `conversation.item.delete` event. This event is used to synchronize the server's understanding of the conversation history with the client's view.
ServerEventConversationItemInputAudioTranscriptionCompleted	This event is the output of audio transcription for user audio written to the user audio buffer. Transcription begins when the input audio buffer is committed by the client or server (in `server_vad` mode). Transcription runs asynchronously with Response creation, so this event may come before or after the Response events. VoiceLive API models accept audio natively, and thus input transcription is a separate process run on a separate ASR (Automatic Speech Recognition) model. The transcript may diverge somewhat from the model's interpretation, and should be treated as a rough guide.
ServerEventConversationItemInputAudioTranscriptionDelta	Returned when the text value of an input audio transcription content part is updated.
ServerEventConversationItemInputAudioTranscriptionFailed	Returned when input audio transcription is configured, and a transcription request for a user message failed. These events are separate from other `error` events so that the client can identify the related Item.
ServerEventConversationItemRetrieved	Returned when a conversation item is retrieved with `conversation.item.retrieve`.
ServerEventConversationItemTruncated	Returned when an earlier assistant audio message item is truncated by the client with a `conversation.item.truncate` event. This event is used to synchronize the server's understanding of the audio with the client's playback. This action will truncate the audio and remove the server-side text transcript to ensure there is no text in the context that hasn't been heard by the user.
ServerEventError	Returned when an error occurs, which could be a client problem or a server problem. Most errors are recoverable and the session will stay open, we recommend to implementors to monitor and log error messages by default.
ServerEventErrorDetails	Details of the error.
ServerEventInputAudioBufferCleared	Returned when the input audio buffer is cleared by the client with a `input_audio_buffer.clear` event.
ServerEventInputAudioBufferCommitted	Returned when an input audio buffer is committed, either by the client or automatically in server VAD mode. The `item_id` property is the ID of the user message item that will be created, thus a `conversation.item.created` event will also be sent to the client.
ServerEventInputAudioBufferSpeechStarted	Sent by the server when in `server_vad` mode to indicate that speech has been detected in the audio buffer. This can happen any time audio is added to the buffer (unless speech is already detected). The client may want to use this event to interrupt audio playback or provide visual feedback to the user. The client should expect to receive a `input_audio_buffer.speech_stopped` event when speech stops. The `item_id` property is the ID of the user message item that will be created when speech stops and will also be included in the `input_audio_buffer.speech_stopped` event (unless the client manually commits the audio buffer during VAD activation).
ServerEventInputAudioBufferSpeechStopped	Returned in `server_vad` mode when the server detects the end of speech in the audio buffer. The server will also send an `conversation.item.created` event with the user message item that is created from the audio buffer.
ServerEventMcpListToolsCompleted	MCP list tools completed message.
ServerEventMcpListToolsFailed	MCP list tools failed message.
ServerEventMcpListToolsInProgress	MCP list tools in progress message.
ServerEventResponseAnimationBlendshapeDelta	Represents a delta update of blendshape animation frames for a specific output of a response.
ServerEventResponseAnimationBlendshapeDone	Indicates the completion of blendshape animation processing for a specific output of a response.
ServerEventResponseAnimationVisemeDelta	Represents a viseme ID delta update for animation based on audio.
ServerEventResponseAnimationVisemeDone	Indicates completion of viseme animation delivery for a response.
ServerEventResponseAudioDelta	Returned when the model-generated audio is updated.
ServerEventResponseAudioDone	Returned when the model-generated audio is done. Also emitted when a Response is interrupted, incomplete, or cancelled.
ServerEventResponseAudioTimestampDelta	Represents a word-level audio timestamp delta for a response.
ServerEventResponseAudioTimestampDone	Indicates completion of audio timestamp delivery for a response.
ServerEventResponseAudioTranscriptDelta	Returned when the model-generated transcription of audio output is updated.
ServerEventResponseAudioTranscriptDone	Returned when the model-generated transcription of audio output is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.
ServerEventResponseContentPartAdded	Returned when a new content part is added to an assistant message item during response generation.
ServerEventResponseContentPartDone	Returned when a content part is done streaming in an assistant message item. Also emitted when a Response is interrupted, incomplete, or cancelled.
ServerEventResponseCreated	Returned when a new Response is created. The first event of response creation, where the response is in an initial state of `in_progress`.
ServerEventResponseDone	Returned when a Response is done streaming. Always emitted, no matter the final state. The Response object included in the `response.done` event will include all output Items in the Response but will omit the raw audio data.
ServerEventResponseFunctionCallArgumentsDelta	Returned when the model-generated function call arguments are updated.
ServerEventResponseFunctionCallArgumentsDone	Returned when the model-generated function call arguments are done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.
ServerEventResponseMcpCallArgumentsDelta	Represents a delta update of the arguments for an MCP tool call.
ServerEventResponseMcpCallArgumentsDone	Indicates the completion of the arguments for an MCP tool call.
ServerEventResponseMcpCallCompleted	Indicates the MCP call has completed.
ServerEventResponseMcpCallFailed	Indicates the MCP call has failed.
ServerEventResponseMcpCallInProgress	Indicates the MCP call running.
ServerEventResponseOutputItemAdded	Returned when a new Item is created during Response generation.
ServerEventResponseOutputItemDone	Returned when an Item is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.
ServerEventResponseTextDelta	Returned when the text value of a "text" content part is updated.
ServerEventResponseTextDone	Returned when the text value of a "text" content part is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.
ServerEventSessionAvatarConnecting	Sent when the server is in the process of establishing an avatar media connection and provides its SDP answer.
ServerEventSessionCreated	Returned when a Session is created. Emitted automatically when a new connection is established as the first server event. This event will contain the default Session configuration.
ServerEventSessionUpdated	Returned when a session is updated with a `session.update` event, unless there is an error.
ServerVad	Base model for VAD-based turn detection.
SessionBase	VoiceLive session object configuration.
SessionContext	Context information provided to session-related handlers
StartSessionOptions
StaticInterimResponseConfig	Configuration for static interim response generation. Randomly selects from configured texts when any trigger condition is met.
SystemMessageItem	A system message item within a conversation.
TokenUsage	Overall usage statistics for a response.
Tool	The base representation of a voicelive tool definition.
ToolChoiceFunctionSelection	The representation of a voicelive tool_choice selecting a named function tool.
ToolChoiceSelection	A base representation for a voicelive tool_choice selecting a named tool.
TurnDetection	Top-level union for turn detection configuration.
TurnOptions
UserMessageItem	A user message item within a conversation.
VideoCrop	Defines a video crop rectangle using top-left and bottom-right coordinates.
VideoParams	Video streaming parameters for avatar.
VideoResolution	Resolution of the video feed in pixels.
VoiceLiveClientOptions
VoiceLiveErrorDetails	Error object returned in case of API failure.
VoiceLiveSessionHandlers	Handler functions for VoiceLive session events following Azure SDK patterns. ALL handlers are optional - implement only the events you care about! Each handler receives strongly-typed event data and context information.
VoiceLiveSessionOptions
VoiceLiveSubscription	Represents an active subscription to VoiceLive session events

Type Aliases

AnimationOutputType	Specifies the types of animation data to output. KnownAnimationOutputType can be used interchangeably with AnimationOutputType, this enum contains the known values that the service supports. Known values supported by the service blendshapes: Blendshapes output type. viseme_id: Viseme ID output type.
AudioTimestampType	Output timestamp types supported in audio response content. KnownAudioTimestampType can be used interchangeably with AudioTimestampType, this enum contains the known values that the service supports. Known values supported by the service word: Timestamps per word in the output audio.
AvatarConfigTypes	Avatar config types KnownAvatarConfigTypes can be used interchangeably with AvatarConfigTypes, this enum contains the known values that the service supports. Known values supported by the service video-avatar: Video avatar photo-avatar: Photo avatar
AvatarOutputProtocol	Avatar config output protocols KnownAvatarOutputProtocol can be used interchangeably with AvatarOutputProtocol, this enum contains the known values that the service supports. Known values supported by the service webrtc: WebRTC protocol, output the audio/video streams via WebRTC websocket: WebSocket protocol, output the video frames over WebSocket
AzureVoiceType	Union of all supported Azure voice types. KnownAzureVoiceType can be used interchangeably with AzureVoiceType, this enum contains the known values that the service supports. Known values supported by the service azure-custom: Azure custom voice. azure-standard: Azure standard voice. azure-personal: Azure personal voice.
AzureVoiceUnion	Alias for AzureVoiceUnion
ClientEventType	Client event types used in VoiceLive protocol. KnownClientEventType can be used interchangeably with ClientEventType, this enum contains the known values that the service supports. Known values supported by the service session.update input_audio_buffer.append input_audio_buffer.commit input_audio_buffer.clear input_audio.turn.start input_audio.turn.append input_audio.turn.end input_audio.turn.cancel input_audio.clear conversation.item.create conversation.item.retrieve conversation.item.truncate conversation.item.delete response.create response.cancel session.avatar.connect mcp_approval_response
ClientEventUnion	Alias for ClientEventUnion
ContentPartType	Type of ContentPartType
ContentPartUnion	Alias for ContentPartUnion
ConversationRequestItemUnion	Alias for ConversationRequestItemUnion
EouDetectionUnion	Alias for EouDetectionUnion
EouThresholdLevel	Threshold level settings for Azure semantic end-of-utterance detection. KnownEouThresholdLevel can be used interchangeably with EouThresholdLevel, this enum contains the known values that the service supports. Known values supported by the service low: Low sensitivity threshold level. medium: Medium sensitivity threshold level. high: High sensitivity threshold level. default: Default sensitivity threshold level.
InputAudioFormat	Input audio format types supported. KnownInputAudioFormat can be used interchangeably with InputAudioFormat, this enum contains the known values that the service supports. Known values supported by the service pcm16: 16-bit PCM audio format at default sampling rate (24kHz) g711_ulaw: G.711 μ-law (mu-law) audio format at 8kHz sampling rate g711_alaw: G.711 A-law audio format at 8kHz sampling rate
InterimResponseConfig	Union of interim response configuration types.
InterimResponseConfigBaseUnion	Alias for InterimResponseConfigBaseUnion
InterimResponseConfigType	Interim response configuration types. KnownInterimResponseConfigType can be used interchangeably with InterimResponseConfigType, this enum contains the known values that the service supports. Known values supported by the service static_interim_response: Static interim response configuration type. llm_interim_response: LLM-based interim response configuration type.
InterimResponseTrigger	Triggers that can activate interim response generation. KnownInterimResponseTrigger can be used interchangeably with InterimResponseTrigger, this enum contains the known values that the service supports. Known values supported by the service latency: Trigger interim response when response latency exceeds threshold. tool: Trigger interim response when a tool call is being executed.
ItemParamStatus	Indicates the processing status of an item or parameter. KnownItemParamStatus can be used interchangeably with ItemParamStatus, this enum contains the known values that the service supports. Known values supported by the service completed: Item or parameter is still being processed. incomplete: Item or parameter is not yet complete.
ItemType	Type of ItemType
MCPApprovalType	The available set of MCP approval types. KnownMCPApprovalType can be used interchangeably with MCPApprovalType, this enum contains the known values that the service supports. Known values supported by the service never: Approval is never required. always: Approval is always required.
MessageContentPartUnion	Alias for MessageContentPartUnion
MessageItemUnion	Alias for MessageItemUnion
MessageRole	Type of MessageRole
Modality	Supported modalities for the session. KnownModality can be used interchangeably with Modality, this enum contains the known values that the service supports. Known values supported by the service text: Text modality. audio: Audio modality. animation: Animation modality. avatar: Avatar modality.
OAIVoice	Supported OpenAI voice names (string enum). KnownOAIVoice can be used interchangeably with OAIVoice, this enum contains the known values that the service supports. Known values supported by the service alloy: Alloy voice. ash: Ash voice. ballad: Ballard voice. coral: Coral voice. echo: Echo voice. sage: Sage voice. shimmer: Shimmer voice. verse: Verse voice. marin: Marin voice. cedar: Cedar voice.
OutputAudioFormat	Output audio format types supported. KnownOutputAudioFormat can be used interchangeably with OutputAudioFormat, this enum contains the known values that the service supports. Known values supported by the service pcm16: 16-bit PCM audio format at default sampling rate (24kHz) pcm16_8000hz: 16-bit PCM audio format at 8kHz sampling rate pcm16_16000hz: 16-bit PCM audio format at 16kHz sampling rate g711_ulaw: G.711 μ-law (mu-law) audio format at 8kHz sampling rate g711_alaw: G.711 A-law audio format at 8kHz sampling rate
PersonalVoiceModels	PersonalVoice models KnownPersonalVoiceModels can be used interchangeably with PersonalVoiceModels, this enum contains the known values that the service supports. Known values supported by the service DragonLatestNeural: Use the latest Dragon model. PhoenixLatestNeural: Use the latest Phoenix model. PhoenixV2Neural: Use the Phoenix V2 model.
PhotoAvatarBaseModes	Photo avatar base modes KnownPhotoAvatarBaseModes can be used interchangeably with PhotoAvatarBaseModes, this enum contains the known values that the service supports. Known values supported by the service vasa-1: VASA-1 model
ReasoningEffort	Constrains effort on reasoning for reasoning models. Check model documentation for supported values for each model. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response. KnownReasoningEffort can be used interchangeably with ReasoningEffort, this enum contains the known values that the service supports. Known values supported by the service none: No reasoning effort. minimal: Minimal reasoning effort. low: Low reasoning effort - faster responses with less reasoning. medium: Medium reasoning effort - balanced between speed and reasoning depth. high: High reasoning effort - more thorough reasoning, may take longer. xhigh: Extra high reasoning effort - maximum reasoning depth.
RequestImageContentPartDetail	Specifies an image's detail level. Can be 'auto', 'low', 'high', or an unknown future value. KnownRequestImageContentPartDetail can be used interchangeably with RequestImageContentPartDetail, this enum contains the known values that the service supports. Known values supported by the service auto: Automatically select an appropriate detail level. low: Use a lower detail level to reduce bandwidth or cost. high: Use a higher detail level—potentially more resource-intensive.
ResponseItemStatus	Indicates the processing status of a response item. KnownResponseItemStatus can be used interchangeably with ResponseItemStatus, this enum contains the known values that the service supports. Known values supported by the service in_progress: Item that is in progress. completed: Item has been fully processed and is complete. incomplete: Item has been processed but is incomplete.
ResponseItemUnion	Alias for ResponseItemUnion
ResponseStatus	Terminal status of a response. KnownResponseStatus can be used interchangeably with ResponseStatus, this enum contains the known values that the service supports. Known values supported by the service completed cancelled failed incomplete in_progress
ResponseStatusDetailsUnion	Alias for ResponseStatusDetailsUnion
ServerEventType	Server event types used in VoiceLive protocol. KnownServerEventType can be used interchangeably with ServerEventType, this enum contains the known values that the service supports. Known values supported by the service error session.avatar.connecting session.created session.updated conversation.item.input_audio_transcription.completed conversation.item.input_audio_transcription.delta conversation.item.input_audio_transcription.failed conversation.item.created conversation.item.retrieved conversation.item.truncated conversation.item.deleted input_audio_buffer.committed input_audio_buffer.cleared input_audio_buffer.speech_started input_audio_buffer.speech_stopped response.created response.done response.output_item.added response.output_item.done response.content_part.added response.content_part.done response.text.delta response.text.done response.audio_transcript.delta response.audio_transcript.done response.audio.delta response.audio.done response.animation_blendshapes.delta response.animation_blendshapes.done response.audio_timestamp.delta response.audio_timestamp.done response.animation_viseme.delta response.animation_viseme.done response.function_call_arguments.delta response.function_call_arguments.done mcp_list_tools.in_progress mcp_list_tools.completed mcp_list_tools.failed response.mcp_call_arguments.delta response.mcp_call_arguments.done response.mcp_call.in_progress response.mcp_call.completed response.mcp_call.failed
ServerEventUnion	Alias for ServerEventUnion
SessionTarget	Target for a Voice Live session, specifying either a model or an agent. Use `{ model: string }` for model-centric sessions where the LLM is the main actor. Use `{ agent: AgentSessionConfig }` for agent-centric sessions where the agent is the main actor. Example Model-centric session `import { DefaultAzureCredential } from "@azure/identity"; import { VoiceLiveClient } from "@azure/ai-voicelive"; const credential = new DefaultAzureCredential(); const endpoint = "https://your-resource.cognitiveservices.azure.com"; const client = new VoiceLiveClient(endpoint, credential); const session = client.createSession({ model: "gpt-4o-realtime-preview" });` Example Agent-centric session `import { DefaultAzureCredential } from "@azure/identity"; import { VoiceLiveClient } from "@azure/ai-voicelive"; const credential = new DefaultAzureCredential(); const endpoint = "https://your-resource.cognitiveservices.azure.com"; const client = new VoiceLiveClient(endpoint, credential); const session = client.createSession({ agent: { agentName: "my-agent", projectName: "my-project" }, });`
ToolChoice	The combined set of available representations for a voicelive tool_choice parameter, encompassing both string literal options like 'auto' as well as structured references to defined tools.
ToolChoiceLiteral	The available set of mode-level, string literal tool_choice options for the voicelive endpoint. KnownToolChoiceLiteral can be used interchangeably with ToolChoiceLiteral, this enum contains the known values that the service supports. Known values supported by the service auto: Specifies that the model should freely determine which tool or tools, if any, to call. none: Specifies that the model should call no tools whatsoever. required: Specifies that the model should call at least one tool.
ToolChoiceSelectionUnion	Alias for ToolChoiceSelectionUnion
ToolType	The supported tool type discriminators for voicelive tools. Currently, only 'function' tools are supported. KnownToolType can be used interchangeably with ToolType, this enum contains the known values that the service supports. Known values supported by the service function mcp
ToolUnion	Alias for ToolUnion
TurnDetectionType	Type of TurnDetectionType
TurnDetectionUnion	Alias for TurnDetectionUnion
Voice	Union of all supported voice configurations.

Enums

ConnectionState	Connection state enumeration for lifecycle management
KnownAnimationOutputType	Specifies the types of animation data to output.
KnownAudioTimestampType	Output timestamp types supported in audio response content.
KnownAvatarConfigTypes	Avatar config types
KnownAvatarOutputProtocol	Avatar config output protocols
KnownAzureVoiceType	Union of all supported Azure voice types.
KnownClientEventType	Client event types used in VoiceLive protocol.
KnownContentPartType	Known values of ContentPartType that the service accepts.
KnownEouThresholdLevel	Threshold level settings for Azure semantic end-of-utterance detection.
KnownInputAudioFormat	Input audio format types supported.
KnownInterimResponseConfigType	Interim response configuration types.
KnownInterimResponseTrigger	Triggers that can activate interim response generation.
KnownItemParamStatus	Indicates the processing status of an item or parameter.
KnownItemType	Known values of ItemType that the service accepts.
KnownMCPApprovalType	The available set of MCP approval types.
KnownMessageRole	Known values of MessageRole that the service accepts.
KnownModality	Supported modalities for the session.
KnownOAIVoice	Supported OpenAI voice names (string enum).
KnownOutputAudioFormat	Output audio format types supported.
KnownPersonalVoiceModels	PersonalVoice models
KnownPhotoAvatarBaseModes	Photo avatar base modes
KnownReasoningEffort	Constrains effort on reasoning for reasoning models. Check model documentation for supported values for each model. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
KnownRequestImageContentPartDetail	Specifies an image's detail level. Can be 'auto', 'low', 'high', or an unknown future value.
KnownResponseItemStatus	Indicates the processing status of a response item.
KnownResponseStatus	Terminal status of a response.
KnownServerEventType	Server event types used in VoiceLive protocol.
KnownToolChoiceLiteral	The available set of mode-level, string literal tool_choice options for the voicelive endpoint.
KnownToolType	The supported tool type discriminators for voicelive tools. Currently, only 'function' tools are supported.
KnownTurnDetectionType	Known values of TurnDetectionType that the service accepts.
VoiceLiveErrorCodes	Error codes for Voice Live WebSocket operations

Functions

classifyConnectionError(unknown)	Classifies connection errors
classifyProtocolError(Error, string)	Classifies protocol errors
isAgentSessionTarget(SessionTarget)	Type guard to check if a SessionTarget specifies an agent session.
isModelSessionTarget(SessionTarget)	Type guard to check if a SessionTarget specifies a model session.

Function Details

classifyConnectionError(unknown)

Classifies connection errors

function classifyConnectionError(error: unknown): VoiceLiveConnectionError

Parameters

error: unknown

Returns

VoiceLiveConnectionError

classifyProtocolError(Error, string)

Classifies protocol errors

function classifyProtocolError(error: Error, messageType: string): VoiceLiveProtocolError

Parameters

error: Error

messageType: string

Returns

VoiceLiveProtocolError

isAgentSessionTarget(SessionTarget)

Type guard to check if a SessionTarget specifies an agent session.

function isAgentSessionTarget(target: SessionTarget): target

Parameters

target: SessionTarget

The session target to check

Returns

target

True if the target specifies an agent session

isModelSessionTarget(SessionTarget)

Type guard to check if a SessionTarget specifies a model session.

function isModelSessionTarget(target: SessionTarget): target

Parameters

target: SessionTarget

The session target to check

Returns

target

True if the target specifies a model session

Feedback

Was this page helpful?

Share via

@azure/ai-voicelive package

Classes

Interfaces

Type Aliases

Known values supported by the service

Known values supported by the service

Known values supported by the service

Known values supported by the service

Known values supported by the service

Known values supported by the service

Known values supported by the service

Known values supported by the service

Known values supported by the service

Known values supported by the service

Known values supported by the service

Known values supported by the service

Known values supported by the service

Known values supported by the service

Known values supported by the service

Known values supported by the service

Known values supported by the service

Known values supported by the service

Known values supported by the service

Known values supported by the service

Known values supported by the service

Known values supported by the service

Known values supported by the service

Known values supported by the service

Enums

Functions

Function Details

classifyConnectionError(unknown)

Parameters

Returns

classifyProtocolError(Error, string)

Parameters

Returns

isAgentSessionTarget(SessionTarget)

Parameters

Returns

isModelSessionTarget(SessionTarget)

Parameters

Returns

Feedback