@azure/ai-voicelive package
Classes
| VoiceLiveAuthenticationError |
Authentication error class for Voice Live operations |
| VoiceLiveClient |
The VoiceLive client provides session management for real-time conversational AI capabilities. This client acts as a factory for creating VoiceLiveSession instances, which handle the actual WebSocket connections and real-time interactions with the service. |
| VoiceLiveConnectionError |
Base error class for Voice Live WebSocket operations |
| VoiceLiveError |
General Voice Live error class |
| VoiceLiveProtocolError |
Protocol error class for Voice Live message operations |
| VoiceLiveSession |
Represents a WebSocket-based session for real-time voice communication with the Azure VoiceLive service. This class manages the connection, handles real-time communication, and provides access to all interactive features including audio streaming, conversation management, and avatar control. |
Interfaces
| AgentConfig |
Configuration for the agent. |
| AgentSessionConfig |
Configuration for creating a session with an agent as the main AI actor. When using an agent session, the agent's configuration (tools, instructions, temperature, etc.) is managed in the Foundry portal, not in session code. |
| Animation |
Configuration for animation outputs including blendshapes and visemes metadata. |
| AssistantMessageItem |
An assistant message item within a conversation. |
| AudioEchoCancellation |
Echo cancellation configuration for server-side audio processing. |
| AudioInputTranscriptionOptions |
Configuration for input audio transcription. |
| AudioNoiseReduction |
Configuration for input audio noise reduction. |
| AudioStreamOptions | |
| AvatarConfig |
Configuration for avatar streaming and behavior during the session. |
| AzureCustomVoice |
Azure custom voice configuration. |
| AzurePersonalVoice |
Azure personal voice configuration. |
| AzureSemanticDetection |
Azure semantic end-of-utterance detection (default). |
| AzureSemanticDetectionEn |
Azure semantic end-of-utterance detection (English-optimized). |
| AzureSemanticDetectionMultilingual |
Azure semantic end-of-utterance detection (multilingual). |
| AzureSemanticVad |
Server Speech Detection (Azure semantic VAD, default variant). |
| AzureSemanticVadEn |
Server Speech Detection (Azure semantic VAD, English-only). |
| AzureSemanticVadMultilingual |
Server Speech Detection (Azure semantic VAD). |
| AzureStandardVoice |
Azure standard voice configuration. |
| AzureVoice |
Base for Azure voice configurations. |
| Background |
Defines a video background, either a solid color or an image URL (mutually exclusive). |
| CachedTokenDetails |
Details of output token usage. |
| ClientEvent |
A voicelive client event. |
| ClientEventConversationItemCreate |
Add a new Item to the Conversation's context, including messages, function
calls, and function call responses. This event can be used both to populate a
"history" of the conversation and to add new items mid-stream, but has the
current limitation that it cannot populate assistant audio messages.
If successful, the server will respond with a |
| ClientEventConversationItemDelete |
Send this event when you want to remove any item from the conversation
history. The server will respond with a |
| ClientEventConversationItemRetrieve |
Send this event when you want to retrieve the server's representation of a specific item in the conversation history. This is useful, for example, to inspect user audio after noise cancellation and VAD.
The server will respond with a |
| ClientEventConversationItemTruncate |
Send this event to truncate a previous assistant message’s audio. The server
will produce audio faster than voicelive, so this event is useful when the user
interrupts to truncate audio that has already been sent to the client but not
yet played. This will synchronize the server's understanding of the audio with
the client's playback.
Truncating audio will delete the server-side text transcript to ensure there
is not text in the context that hasn't been heard by the user.
If successful, the server will respond with a |
| ClientEventInputAudioBufferAppend |
Send this event to append audio bytes to the input audio buffer. The audio buffer is temporary storage you can write to and later commit. In Server VAD mode, the audio buffer is used to detect speech and the server will decide when to commit. When Server VAD is disabled, you must commit the audio buffer manually. The client may choose how much audio to place in each event up to a maximum of 15 MiB, for example streaming smaller chunks from the client may allow the VAD to be more responsive. Unlike made other client events, the server will not send a confirmation response to this event. |
| ClientEventInputAudioBufferClear |
Send this event to clear the audio bytes in the buffer. The server will
respond with an |
| ClientEventInputAudioBufferCommit |
Send this event to commit the user input audio buffer, which will create a
new user message item in the conversation. This event will produce an error
if the input audio buffer is empty. When in Server VAD mode, the client does
not need to send this event, the server will commit the audio buffer
automatically.
Committing the input audio buffer will trigger input audio transcription
(if enabled in session configuration), but it will not create a response
from the model. The server will respond with an |
| ClientEventInputAudioClear |
Clears all input audio currently being streamed. |
| ClientEventInputAudioTurnAppend |
Appends audio data to an ongoing input turn. |
| ClientEventInputAudioTurnCancel |
Cancels an in-progress input audio turn. |
| ClientEventInputAudioTurnEnd |
Marks the end of an audio input turn. |
| ClientEventInputAudioTurnStart |
Indicates the start of a new audio input turn. |
| ClientEventResponseCancel |
Send this event to cancel an in-progress response. The server will respond
with a |
| ClientEventResponseCreate |
This event instructs the server to create a Response, which means triggering
model inference. When in Server VAD mode, the server will create Responses
automatically.
A Response will include at least one Item, and may have two, in which case
the second will be a function call. These Items will be appended to the
conversation history.
The server will respond with a |
| ClientEventSessionAvatarConnect |
Sent when the client connects and provides its SDP (Session Description Protocol) for avatar-related media negotiation. |
| ClientEventSessionUpdate |
Send this event to update the session’s default configuration.
The client may send this event at any time to update any field,
except for |
| ConnectOptions | |
| ConnectedEventArgs |
Arguments provided when a connection is established |
| ConnectionContext |
Context information provided to connection-related handlers |
| ContentPart |
Base for any content part; discriminated by |
| ConversationItemBase |
The item to add to the conversation. |
| ConversationRequestItem |
Base for any response item; discriminated by |
| CreateSessionOptions | |
| DisconnectedEventArgs |
Arguments provided when a connection is lost |
| EouDetection |
Top-level union for end-of-utterance (EOU) semantic detection configuration. |
| ErrorEventArgs |
Arguments provided when an error occurs |
| ErrorResponse |
Standard error response envelope. |
| FunctionCallItem |
A function call item within a conversation. |
| FunctionCallOutputItem |
A function call output item within a conversation. |
| FunctionTool |
The definition of a function tool as used by the voicelive endpoint. |
| IceServer |
ICE server configuration for WebRTC connection negotiation. |
| InputAudioContentPart |
Input audio content part. |
| InputTextContentPart |
Input text content part. |
| InputTokenDetails |
Details of input token usage. |
| InterimResponseConfigBase |
Base model for interim response configuration. |
| LlmInterimResponseConfig |
Configuration for LLM-based interim response generation. Uses LLM to generate context-aware interim responses when any trigger condition is met. |
| LogProbProperties |
A single log probability entry for a token. |
| MCPApprovalResponseRequestItem |
A request item that represents a response to an MCP approval request. |
| MCPServer |
The definition of an MCP server as used by the voicelive endpoint. |
| MCPTool |
Represents a mcp tool definition. |
| MessageContentPart |
Base for any message content part; discriminated by |
| MessageItem |
A message item within a conversation. |
| OpenAIVoice |
OpenAI voice configuration with explicit type field. This provides a unified interface for OpenAI voices, complementing the existing string-based OAIVoice for backward compatibility. |
| OutputTextContentPart |
Output text content part. |
| OutputTokenDetails |
Details of output token usage. |
| RequestAudioContentPart |
An audio content part for a request. This is supported only by realtime models (e.g., gpt-realtime). For text-based models, use |
| RequestImageContentPart |
Input image content part. |
| RequestSession |
Base for session configuration shared between request and response. |
| RequestTextContentPart |
A text content part for a request. |
| Response |
The response resource. |
| ResponseAudioContentPart |
An audio content part for a response. |
| ResponseCancelledDetails |
Details for a cancelled response. |
| ResponseCreateParams |
Create a new VoiceLive response with these parameters |
| ResponseFailedDetails |
Details for a failed response. |
| ResponseFunctionCallItem |
A function call item within a conversation. |
| ResponseFunctionCallOutputItem |
A function call output item within a conversation. |
| ResponseIncompleteDetails |
Details for an incomplete response. |
| ResponseItem |
Base for any response item; discriminated by |
| ResponseMCPApprovalRequestItem |
A response item that represents a request for approval to call an MCP tool. |
| ResponseMCPApprovalResponseItem |
A response item that represents a response to an MCP approval request. |
| ResponseMCPCallItem |
A response item that represents a call to an MCP tool. |
| ResponseMCPListToolItem |
A response item that lists the tools available on an MCP server. |
| ResponseMessageItem |
Base type for message item within a conversation. |
| ResponseSession |
Base for session configuration in the response. |
| ResponseStatusDetails |
Base for all non-success response details. |
| ResponseTextContentPart |
A text content part for a response. |
| SendEventOptions | |
| ServerEvent |
A voicelive server event. |
| ServerEventConversationItemCreated |
Returned when a conversation item is created. There are several scenarios that produce this event:
|
| ServerEventConversationItemDeleted |
Returned when an item in the conversation is deleted by the client with a
|
| ServerEventConversationItemInputAudioTranscriptionCompleted |
This event is the output of audio transcription for user audio written to the
user audio buffer. Transcription begins when the input audio buffer is
committed by the client or server (in |
| ServerEventConversationItemInputAudioTranscriptionDelta |
Returned when the text value of an input audio transcription content part is updated. |
| ServerEventConversationItemInputAudioTranscriptionFailed |
Returned when input audio transcription is configured, and a transcription
request for a user message failed. These events are separate from other
|
| ServerEventConversationItemRetrieved |
Returned when a conversation item is retrieved with |
| ServerEventConversationItemTruncated |
Returned when an earlier assistant audio message item is truncated by the
client with a |
| ServerEventError |
Returned when an error occurs, which could be a client problem or a server problem. Most errors are recoverable and the session will stay open, we recommend to implementors to monitor and log error messages by default. |
| ServerEventErrorDetails |
Details of the error. |
| ServerEventInputAudioBufferCleared |
Returned when the input audio buffer is cleared by the client with a
|
| ServerEventInputAudioBufferCommitted |
Returned when an input audio buffer is committed, either by the client or
automatically in server VAD mode. The |
| ServerEventInputAudioBufferSpeechStarted |
Sent by the server when in |
| ServerEventInputAudioBufferSpeechStopped |
Returned in |
| ServerEventMcpListToolsCompleted |
MCP list tools completed message. |
| ServerEventMcpListToolsFailed |
MCP list tools failed message. |
| ServerEventMcpListToolsInProgress |
MCP list tools in progress message. |
| ServerEventResponseAnimationBlendshapeDelta |
Represents a delta update of blendshape animation frames for a specific output of a response. |
| ServerEventResponseAnimationBlendshapeDone |
Indicates the completion of blendshape animation processing for a specific output of a response. |
| ServerEventResponseAnimationVisemeDelta |
Represents a viseme ID delta update for animation based on audio. |
| ServerEventResponseAnimationVisemeDone |
Indicates completion of viseme animation delivery for a response. |
| ServerEventResponseAudioDelta |
Returned when the model-generated audio is updated. |
| ServerEventResponseAudioDone |
Returned when the model-generated audio is done. Also emitted when a Response is interrupted, incomplete, or cancelled. |
| ServerEventResponseAudioTimestampDelta |
Represents a word-level audio timestamp delta for a response. |
| ServerEventResponseAudioTimestampDone |
Indicates completion of audio timestamp delivery for a response. |
| ServerEventResponseAudioTranscriptDelta |
Returned when the model-generated transcription of audio output is updated. |
| ServerEventResponseAudioTranscriptDone |
Returned when the model-generated transcription of audio output is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled. |
| ServerEventResponseContentPartAdded |
Returned when a new content part is added to an assistant message item during response generation. |
| ServerEventResponseContentPartDone |
Returned when a content part is done streaming in an assistant message item. Also emitted when a Response is interrupted, incomplete, or cancelled. |
| ServerEventResponseCreated |
Returned when a new Response is created. The first event of response creation,
where the response is in an initial state of |
| ServerEventResponseDone |
Returned when a Response is done streaming. Always emitted, no matter the
final state. The Response object included in the |
| ServerEventResponseFunctionCallArgumentsDelta |
Returned when the model-generated function call arguments are updated. |
| ServerEventResponseFunctionCallArgumentsDone |
Returned when the model-generated function call arguments are done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled. |
| ServerEventResponseMcpCallArgumentsDelta |
Represents a delta update of the arguments for an MCP tool call. |
| ServerEventResponseMcpCallArgumentsDone |
Indicates the completion of the arguments for an MCP tool call. |
| ServerEventResponseMcpCallCompleted |
Indicates the MCP call has completed. |
| ServerEventResponseMcpCallFailed |
Indicates the MCP call has failed. |
| ServerEventResponseMcpCallInProgress |
Indicates the MCP call running. |
| ServerEventResponseOutputItemAdded |
Returned when a new Item is created during Response generation. |
| ServerEventResponseOutputItemDone |
Returned when an Item is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled. |
| ServerEventResponseTextDelta |
Returned when the text value of a "text" content part is updated. |
| ServerEventResponseTextDone |
Returned when the text value of a "text" content part is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled. |
| ServerEventSessionAvatarConnecting |
Sent when the server is in the process of establishing an avatar media connection and provides its SDP answer. |
| ServerEventSessionCreated |
Returned when a Session is created. Emitted automatically when a new connection is established as the first server event. This event will contain the default Session configuration. |
| ServerEventSessionUpdated |
Returned when a session is updated with a |
| ServerVad |
Base model for VAD-based turn detection. |
| SessionBase |
VoiceLive session object configuration. |
| SessionContext |
Context information provided to session-related handlers |
| StartSessionOptions | |
| StaticInterimResponseConfig |
Configuration for static interim response generation. Randomly selects from configured texts when any trigger condition is met. |
| SystemMessageItem |
A system message item within a conversation. |
| TokenUsage |
Overall usage statistics for a response. |
| Tool |
The base representation of a voicelive tool definition. |
| ToolChoiceFunctionSelection |
The representation of a voicelive tool_choice selecting a named function tool. |
| ToolChoiceSelection |
A base representation for a voicelive tool_choice selecting a named tool. |
| TurnDetection |
Top-level union for turn detection configuration. |
| TurnOptions | |
| UserMessageItem |
A user message item within a conversation. |
| VideoCrop |
Defines a video crop rectangle using top-left and bottom-right coordinates. |
| VideoParams |
Video streaming parameters for avatar. |
| VideoResolution |
Resolution of the video feed in pixels. |
| VoiceLiveClientOptions | |
| VoiceLiveErrorDetails |
Error object returned in case of API failure. |
| VoiceLiveSessionHandlers |
Handler functions for VoiceLive session events following Azure SDK patterns. ALL handlers are optional - implement only the events you care about! Each handler receives strongly-typed event data and context information. |
| VoiceLiveSessionOptions | |
| VoiceLiveSubscription |
Represents an active subscription to VoiceLive session events |
Type Aliases
| AnimationOutputType |
Specifies the types of animation data to output. Known values supported by the serviceblendshapes: Blendshapes output type. |
| AudioTimestampType |
Output timestamp types supported in audio response content. Known values supported by the serviceword: Timestamps per word in the output audio. |
| AvatarConfigTypes |
Avatar config types Known values supported by the servicevideo-avatar: Video avatar |
| AvatarOutputProtocol |
Avatar config output protocols Known values supported by the servicewebrtc: WebRTC protocol, output the audio/video streams via WebRTC |
| AzureVoiceType |
Union of all supported Azure voice types. Known values supported by the serviceazure-custom: Azure custom voice. |
| AzureVoiceUnion |
Alias for AzureVoiceUnion |
| ClientEventType |
Client event types used in VoiceLive protocol. Known values supported by the servicesession.update |
| ClientEventUnion |
Alias for ClientEventUnion |
| ContentPartType |
Type of ContentPartType |
| ContentPartUnion |
Alias for ContentPartUnion |
| ConversationRequestItemUnion |
Alias for ConversationRequestItemUnion |
| EouDetectionUnion |
Alias for EouDetectionUnion |
| EouThresholdLevel |
Threshold level settings for Azure semantic end-of-utterance detection. Known values supported by the servicelow: Low sensitivity threshold level. |
| InputAudioFormat |
Input audio format types supported. Known values supported by the servicepcm16: 16-bit PCM audio format at default sampling rate (24kHz) |
| InterimResponseConfig |
Union of interim response configuration types. |
| InterimResponseConfigBaseUnion |
Alias for InterimResponseConfigBaseUnion |
| InterimResponseConfigType |
Interim response configuration types. Known values supported by the servicestatic_interim_response: Static interim response configuration type. |
| InterimResponseTrigger |
Triggers that can activate interim response generation. Known values supported by the servicelatency: Trigger interim response when response latency exceeds threshold. |
| ItemParamStatus |
Indicates the processing status of an item or parameter. Known values supported by the servicecompleted: Item or parameter is still being processed. |
| ItemType |
Type of ItemType |
| MCPApprovalType |
The available set of MCP approval types. Known values supported by the servicenever: Approval is never required. |
| MessageContentPartUnion |
Alias for MessageContentPartUnion |
| MessageItemUnion |
Alias for MessageItemUnion |
| MessageRole |
Type of MessageRole |
| Modality |
Supported modalities for the session. Known values supported by the servicetext: Text modality. |
| OAIVoice |
Supported OpenAI voice names (string enum). Known values supported by the servicealloy: Alloy voice. |
| OutputAudioFormat |
Output audio format types supported. Known values supported by the servicepcm16: 16-bit PCM audio format at default sampling rate (24kHz) |
| PersonalVoiceModels |
PersonalVoice models Known values supported by the serviceDragonLatestNeural: Use the latest Dragon model. |
| PhotoAvatarBaseModes |
Photo avatar base modes Known values supported by the servicevasa-1: VASA-1 model |
| ReasoningEffort |
Constrains effort on reasoning for reasoning models. Check model documentation for supported values for each model.
Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response. Known values supported by the servicenone: No reasoning effort. |
| RequestImageContentPartDetail |
Specifies an image's detail level. Can be 'auto', 'low', 'high', or an unknown future value. Known values supported by the serviceauto: Automatically select an appropriate detail level. |
| ResponseItemStatus |
Indicates the processing status of a response item. Known values supported by the servicein_progress: Item that is in progress. |
| ResponseItemUnion |
Alias for ResponseItemUnion |
| ResponseStatus |
Terminal status of a response. Known values supported by the servicecompleted |
| ResponseStatusDetailsUnion |
Alias for ResponseStatusDetailsUnion |
| ServerEventType |
Server event types used in VoiceLive protocol. Known values supported by the serviceerror |
| ServerEventUnion |
Alias for ServerEventUnion |
| SessionTarget |
Target for a Voice Live session, specifying either a model or an agent. Use Example Model-centric session
Example Agent-centric session
|
| ToolChoice |
The combined set of available representations for a voicelive tool_choice parameter, encompassing both string literal options like 'auto' as well as structured references to defined tools. |
| ToolChoiceLiteral |
The available set of mode-level, string literal tool_choice options for the voicelive endpoint. Known values supported by the serviceauto: Specifies that the model should freely determine which tool or tools, if any, to call. |
| ToolChoiceSelectionUnion |
Alias for ToolChoiceSelectionUnion |
| ToolType |
The supported tool type discriminators for voicelive tools.
Currently, only 'function' tools are supported. Known values supported by the servicefunction |
| ToolUnion |
Alias for ToolUnion |
| TurnDetectionType |
Type of TurnDetectionType |
| TurnDetectionUnion |
Alias for TurnDetectionUnion |
| Voice |
Union of all supported voice configurations. |
Enums
| ConnectionState |
Connection state enumeration for lifecycle management |
| KnownAnimationOutputType |
Specifies the types of animation data to output. |
| KnownAudioTimestampType |
Output timestamp types supported in audio response content. |
| KnownAvatarConfigTypes |
Avatar config types |
| KnownAvatarOutputProtocol |
Avatar config output protocols |
| KnownAzureVoiceType |
Union of all supported Azure voice types. |
| KnownClientEventType |
Client event types used in VoiceLive protocol. |
| KnownContentPartType |
Known values of ContentPartType that the service accepts. |
| KnownEouThresholdLevel |
Threshold level settings for Azure semantic end-of-utterance detection. |
| KnownInputAudioFormat |
Input audio format types supported. |
| KnownInterimResponseConfigType |
Interim response configuration types. |
| KnownInterimResponseTrigger |
Triggers that can activate interim response generation. |
| KnownItemParamStatus |
Indicates the processing status of an item or parameter. |
| KnownItemType |
Known values of ItemType that the service accepts. |
| KnownMCPApprovalType |
The available set of MCP approval types. |
| KnownMessageRole |
Known values of MessageRole that the service accepts. |
| KnownModality |
Supported modalities for the session. |
| KnownOAIVoice |
Supported OpenAI voice names (string enum). |
| KnownOutputAudioFormat |
Output audio format types supported. |
| KnownPersonalVoiceModels |
PersonalVoice models |
| KnownPhotoAvatarBaseModes |
Photo avatar base modes |
| KnownReasoningEffort |
Constrains effort on reasoning for reasoning models. Check model documentation for supported values for each model. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response. |
| KnownRequestImageContentPartDetail |
Specifies an image's detail level. Can be 'auto', 'low', 'high', or an unknown future value. |
| KnownResponseItemStatus |
Indicates the processing status of a response item. |
| KnownResponseStatus |
Terminal status of a response. |
| KnownServerEventType |
Server event types used in VoiceLive protocol. |
| KnownToolChoiceLiteral |
The available set of mode-level, string literal tool_choice options for the voicelive endpoint. |
| KnownToolType |
The supported tool type discriminators for voicelive tools. Currently, only 'function' tools are supported. |
| KnownTurnDetectionType |
Known values of TurnDetectionType that the service accepts. |
| VoiceLiveErrorCodes |
Error codes for Voice Live WebSocket operations |
Functions
| classify |
Classifies connection errors |
| classify |
Classifies protocol errors |
| is |
Type guard to check if a SessionTarget specifies an agent session. |
| is |
Type guard to check if a SessionTarget specifies a model session. |
Function Details
classifyConnectionError(unknown)
Classifies connection errors
function classifyConnectionError(error: unknown): VoiceLiveConnectionError
Parameters
- error
-
unknown
Returns
classifyProtocolError(Error, string)
Classifies protocol errors
function classifyProtocolError(error: Error, messageType: string): VoiceLiveProtocolError
Parameters
- error
-
Error
- messageType
-
string
Returns
isAgentSessionTarget(SessionTarget)
Type guard to check if a SessionTarget specifies an agent session.
function isAgentSessionTarget(target: SessionTarget): target
Parameters
- target
- SessionTarget
The session target to check
Returns
target
True if the target specifies an agent session
isModelSessionTarget(SessionTarget)
Type guard to check if a SessionTarget specifies a model session.
function isModelSessionTarget(target: SessionTarget): target
Parameters
- target
- SessionTarget
The session target to check
Returns
target
True if the target specifies a model session