Azure VoiceLive client library for JavaScript - version 1.0.0-beta.1

Azure VoiceLive 是一項託管服務，能為語音代理提供低延遲、高品質的語音對語音互動。該服務整合語音辨識、生成式 AI 與文字轉語音功能於單一統一介面，提供端對端解決方案，打造無縫的語音體驗。

使用用戶端連結庫來：

建立即時語音助理與對話代理
打造延遲極低的語音對語音應用程式
整合進階的對話功能，如噪音抑制與回聲消除
利用多種 AI 模型（GPT-4o、GPT-4o-mini、Phi）來應對不同應用情境
實作函式呼叫與工具整合以實現動態回應
創造具虛擬化身功能的語音互動，並結合視覺元件

注意：此套件支援瀏覽器與 Node.js 環境。 WebSocket 連接用於即時通訊。

入門指南

目前支援的環境

LTS 版本的 Node.js
最新版的 Safari、Chrome、Edge 和 Firefox

先決條件

Azure 訂用帳戶
一個具備 Voice Live API 存取權的 Azure AI Foundry 資源

安裝套件

使用 npm 安裝 Azure VoiceLive 用戶端函式庫：

npm install @azure/ai-voicelive

安裝身份函式庫

VoiceLive 用戶端是使用 Azure 身份函式庫進行認證。也請安裝：

npm install @azure/identity

設定 TypeScript

TypeScript 使用者需要安裝節點類型定義：

npm install @types/node

你也需要在 tsconfig.json中啟用 compilerOptions.allowSyntheticDefaultImports 。請注意，如果你啟用 compilerOptions.esModuleInterop了，預設 allowSyntheticDefaultImports 就是啟用。

JavaScript 套件組合

若要在瀏覽器中使用此用戶端連結庫，您必須先使用配套程式。如需如何執行這項操作的詳細資訊，請參閱我們的組合檔。

重要概念

VoiceLiveClient

建立與 Azure VoiceLive 服務連線的主要介面。使用此用戶端來驗證並建立即時語音互動的會話。

VoiceLiveSession

代表用於即時語音通訊的主動 WebSocket 連線。這門課處理雙向通訊，讓你能即時發送音訊輸入、接收音訊輸出、文字轉錄及其他事件。

工作階段設定

該服務利用會話設定來控制語音互動的各個面向：

轉向偵測：設定服務如何偵測使用者何時開始或停止說話
音訊處理：啟用降噪與回聲消除
語音選擇：可選擇標準 Azure 語音、高畫質語音或自訂語音
模型選擇：選擇最適合您需求的 AI 模型（GPT-4o、GPT-4o-mini、Phi 變體）

型號與功能

VoiceLive API 支援多個具備不同功能的 AI 模型：

型號	Description	用例
`gpt-4o-realtime-preview`	具備即時音訊處理功能的 GPT-4o	高品質對話式 AI
`gpt-4o-mini-realtime-preview`	輕量級 GPT-4o 變體	快速且高效的互動
`phi4-mm-realtime`	多模態支援的 Phi 模型	具成本效益的語音應用程式

對話增強功能

VoiceLive API 提供 Azure 專屬的增強功能：

Azure 語意 VAD：進階語音活動偵測，能移除填充詞
降噪：降低環境背景噪音
回聲消除：去除模特本人聲音中的回聲
回合結束偵測：允許自然暫停且不會過早中斷

使用 Azure Active Directory 認證

VoiceLive 服務依賴 Azure Active Directory 來驗證其 API 的請求。 @azure/identity 套件提供應用程式可用來執行這項作業的各種認證類型。 @azure/identity 的自述檔提供更多詳細數據和範例，讓您開始使用。

要與 Azure VoiceLive 服務互動，你需要建立該 VoiceLiveClient 類別的實例、 服務端點 和憑證物件。本文件中展示的範例使用名為 DefaultAzureCredential的憑證物件，適用於大多數情境，包括本地開發與生產環境。我們建議在生產環境中使用受管理身份來進行認證。

您可以在 Azure Identity 文件中找到更多關於不同認證方式及其對應憑證類型的資訊。

這裡有個簡短的例子。首先，匯入 DefaultAzureCredential 和 VoiceLiveClient：

import { DefaultAzureCredential } from "@azure/identity";
import { VoiceLiveClient } from "@azure/ai-voicelive";

const credential = new DefaultAzureCredential();

// Build the URL to reach your AI Foundry resource
const endpoint = "https://your-resource.cognitiveservices.azure.com";

// Create the VoiceLive client
const client = new VoiceLiveClient(endpoint, credential);

API 金鑰認證

在開發情境中，你也可以使用 API 金鑰進行認證：

import { AzureKeyCredential } from "@azure/core-auth";
import { VoiceLiveClient } from "@azure/ai-voicelive";

const endpoint = "https://your-resource.cognitiveservices.azure.com";
const credential = new AzureKeyCredential("your-api-key");

const client = new VoiceLiveClient(endpoint, credential);

範例

以下章節提供程式碼摘要，涵蓋使用 Azure VoiceLive 的一些常見任務。此處涵蓋的情境包括：

建立基本語音助理
設定會話選項
即時事件處理
實作函式呼叫

建立基本語音助理

這個範例展示了如何打造一個簡單的語音助理，能夠處理語音對語音的互動：

import { DefaultAzureCredential } from "@azure/identity";
import { VoiceLiveClient } from "@azure/ai-voicelive";

const credential = new DefaultAzureCredential();
const endpoint = "https://your-resource.cognitiveservices.azure.com";

// Create the client
const client = new VoiceLiveClient(endpoint, credential);

// Create and connect a session
const session = await client.startSession("gpt-4o-mini-realtime-preview");

// Configure session for voice conversation
await session.updateSession({
  modalities: ["text", "audio"],
  instructions: "You are a helpful AI assistant. Respond naturally and conversationally.",
  voice: {
    type: "azure-standard",
    name: "en-US-AvaNeural",
  },
  turnDetection: {
    type: "server_vad",
    threshold: 0.5,
    prefixPaddingMs: 300,
    silenceDurationMs: 500,
  },
  inputAudioFormat: "pcm16",
  outputAudioFormat: "pcm16",
});

設定會話選項

你可以自訂語音互動的各個面向：

import { DefaultAzureCredential } from "@azure/identity";
import { VoiceLiveClient } from "@azure/ai-voicelive";

const credential = new DefaultAzureCredential();
const endpoint = "https://your-resource.cognitiveservices.azure.com";
const client = new VoiceLiveClient(endpoint, credential);
const session = await client.startSession("gpt-4o-realtime-preview");

// Advanced session configuration
await session.updateSession({
  modalities: ["audio", "text"],
  instructions: "You are a customer service representative. Be helpful and professional.",
  voice: {
    type: "azure-custom",
    name: "your-custom-voice-name",
    endpointId: "your-custom-voice-endpoint",
  },
  turnDetection: {
    type: "server_vad",
    threshold: 0.6,
    prefixPaddingMs: 200,
    silenceDurationMs: 300,
  },
  inputAudioFormat: "pcm16",
  outputAudioFormat: "pcm16",
});

即時事件處理

VoiceLive 用戶端提供事件驅動的即時互動通訊：

import { DefaultAzureCredential } from "@azure/identity";
import { VoiceLiveClient } from "@azure/ai-voicelive";

const credential = new DefaultAzureCredential();
const endpoint = "https://your-resource.cognitiveservices.azure.com";
const client = new VoiceLiveClient(endpoint, credential);
const session = await client.startSession("gpt-4o-mini-realtime-preview");

// Set up event handlers using subscription pattern
const subscription = session.subscribe({
  onResponseAudioDelta: async (event, context) => {
    // Handle incoming audio chunks
    const audioData = event.delta;
    // Play audio using Web Audio API or other audio system
    playAudioChunk(audioData);
  },

  onResponseTextDelta: async (event, context) => {
    // Handle incoming text deltas
    console.log("Assistant:", event.delta);
  },

  onInputAudioTranscriptionCompleted: async (event, context) => {
    // Handle user speech transcription
    console.log("User said:", event.transcript);
  },
});

// Send audio data from microphone
function sendAudioChunk(audioBuffer: ArrayBuffer) {
  session.sendAudio(audioBuffer);
}

實作函式呼叫

啟用語音助理呼叫外部功能與工具：

import { DefaultAzureCredential } from "@azure/identity";
import { VoiceLiveClient } from "@azure/ai-voicelive";

const credential = new DefaultAzureCredential();
const endpoint = "https://your-resource.cognitiveservices.azure.com";
const client = new VoiceLiveClient(endpoint, credential);
const session = await client.startSession("gpt-4o-mini-realtime-preview");

// Define available functions
const tools = [
  {
    type: "function",
    name: "get_weather",
    description: "Get current weather for a location",
    parameters: {
      type: "object",
      properties: {
        location: {
          type: "string",
          description: "The city and state or country",
        },
      },
      required: ["location"],
    },
  },
];

// Configure session with tools
await session.updateSession({
  modalities: ["audio", "text"],
  instructions:
    "You can help users with weather information. Use the get_weather function when needed.",
  tools: tools,
  toolChoice: "auto",
});

// Handle function calls
const subscription = session.subscribe({
  onResponseFunctionCallArgumentsDone: async (event, context) => {
    if (event.name === "get_weather") {
      const args = JSON.parse(event.arguments);
      const weatherData = await getWeatherData(args.location);

      // Send function result back
      await session.addConversationItem({
        type: "function_call_output",
        callId: event.callId,
        output: JSON.stringify(weatherData),
      });

      // Request response generation
      await session.sendEvent({
        type: "response.create",
      });
    }
  },
});

故障排除

常見錯誤及例外狀況

認證錯誤：若收到認證錯誤，請確認：

您的 Azure AI Foundry 資源設定正確
你的 API 金鑰或憑證擁有必要的權限
端點網址正確且可存取

WebSocket 連線問題：VoiceLive 使用 WebSocket 連線。請確定：

你的網路允許 WebSocket 連線
防火牆規則允許連線 *.cognitiveservices.azure.com
瀏覽器政策允許 WebSocket 與麥克風存取（用於瀏覽器使用）

音訊問題：關於音訊相關問題：

在瀏覽器中確認麥克風權限
請確認支援音訊格式（PCM16、PCM24）
確保播放時有適當的音訊背景設定

森林伐木業

啟用記錄可能有助於找出有關失敗的實用資訊。若要查看 WebSocket 訊息與回應的日誌，請將環境變數設 AZURE_LOG_LEVEL 為 info。或者，您可以在運行時間啟用記錄，方法是在 setLogLevel中呼叫 @azure/logger：

import { setLogLevel } from "@azure/logger";

setLogLevel("info");

如需如何啟用記錄的詳細指示，請參閱@azure/記錄器套件檔。

後續步驟

你可以透過以下連結找到更多程式碼範例：

Contributing

如果您想要參與此連結庫，請閱讀參與指南，以深入瞭解如何建置和測試程序代碼。

意見反應

此頁面對您有幫助嗎？

Last updated on 2025-11-18