Azure VoiceLive client library for JavaScript - version 1.0.0-beta.1

Azure VoiceLiveは、低遅延で高品質な音声間対話を可能にするマネージドサービスです。このサービスは音声認識、生成AI、テキスト読み上げ機能を統合し、音声駆動のシームレスな体験を実現するエンドツーエンドソリューションを提供します。

クライアントライブラリを使用して、次の手順を実行します。

リアルタイムの音声アシスタントや会話型エージェントを作成してください
遅延を最小限に抑えた音声間アプリケーションを構築する
ノイズ抑制やエコーキャンセリングなどの高度な会話機能を統合します
異なるユースケースで複数のAIモデル(GPT-4o、GPT-4o-mini、Phi)を活用しましょう
動的応答のための関数呼び出しとツール統合を実装する
視覚的コンポーネントとアバター対応の音声インタラクションを作成

注:このパッケージはブラウザ環境と Node.js の両方に対応しています。 WebSocket接続はリアルタイム通信に使用されます。

作業の開始

現在サポートされている環境

Node.js の LTS バージョン
Safari、Chrome、Edge、Firefox の最新バージョン

[前提条件]

Azure サブスクリプション
Voice Live APIアクセスを備えたAzure AI Foundryリソース

パッケージをインストールする

npmを使ってAzure VoiceLiveクライアントライブラリをインストールしてください:

npm install @azure/ai-voicelive

アイデンティティライブラリをインストールしてください

VoiceLiveクライアントはAzure Identity Libraryを使って認証します。こちらもインストールしてください:

npm install @azure/identity

TypeScriptの設定

TypeScriptユーザーはNode型定義をインストールしておく必要があります:

npm install @types/node

また、tsconfig.jsonで compilerOptions.allowSyntheticDefaultImports を有効にする必要があります。 compilerOptions.esModuleInteropを有効にしている場合、allowSyntheticDefaultImportsはデフォルトで有効になっていることに注意してください。

JavaScript バンドル

ブラウザーでこのクライアントライブラリを使用するには、まず、バンドルを使用する必要があります。これを行う方法の詳細については、バンドルドキュメントを参照してください。

重要な概念

VoiceLiveクライアント

Azure VoiceLiveサービスへの接続を確立するための主要なインターフェースです。このクライアントを使って認証し、リアルタイムの音声交流のためのセッションを作成してください。

VoiceLiveSession

リアルタイム音声通信のためのアクティブなWebSocket接続を表します。このクラスは双方向通信を扱い、音声入力や音声出力、テキストの書き起こし、その他のイベントをリアルタイムで送受信できます。

セッション構成

このサービスは、音声操作のさまざまな側面を制御するためにセッション構成を用いています:

ターン検出:ユーザーが話し始めたり話したりやめたりする際の検出方法を設定できます
オーディオ処理:ノイズ抑制とエコーキャンセリングを有効にする
音声選択:標準のAzure音声、高精細音声、カスタム音声から選択可能です
モデル選択:ご自身のニーズに最も合ったAIモデル(GPT-4o、GPT-4o-mini、Phiのバリエーション)を選択してください

モデルと機能

VoiceLive APIは、異なる機能を持つ複数のAIモデルをサポートしています:

モデル	Description	使用事例
`gpt-4o-realtime-preview`	リアルタイム音声処理対応GPT-4o	高品質な会話型AI
`gpt-4o-mini-realtime-preview`	軽量GPT-4oバリアント	高速で効率的な相互作用
`phi4-mm-realtime`	マルチモーダル対応のPhiモデル	コスト効率の良い音声アプリケーション

会話向上機能

VoiceLive APIはAzure特有の強化機能を提供します:

Azure Semantic VAD:フィラーワードを除去する高度な音声活動検出
ノイズ抑制:環境の背景ノイズを低減します
エコーキャンセリング:モデル自身の声からエコーを除去します
ターン終了検知:早すぎる中断なしに自然な一時停止を可能にします

Azure Active Directoryでの認証

VoiceLiveサービスはAzure Active Directoryを利用してAPIへのリクエストを認証しています。 @azure/identity パッケージには、アプリケーションでこれを行うために使用できるさまざまな資格情報の種類が用意されています。 @azure/identity 用の README には、作業を開始するための詳細とサンプルが用意されています。

Azure VoiceLiveサービスとやり取りするには、 VoiceLiveClient クラスのインスタンス、 サービスエンドポイント 、認証オブジェクトを作成する必要があります。このドキュメントで示された例は、 DefaultAzureCredentialという名の認証情報オブジェクトを使用しており、これはローカル開発や本番環境を含むほとんどのシナリオに適しています。本番環境での認証にはマネージドID の使用を推奨します。

認証方法やそれに対応する認証情報の種類については、 Azure Identityドキュメントで詳しくご覧いただけます。

簡単に例を挙げます。まず、 DefaultAzureCredential をインポートして VoiceLiveClient:

import { DefaultAzureCredential } from "@azure/identity";
import { VoiceLiveClient } from "@azure/ai-voicelive";

const credential = new DefaultAzureCredential();

// Build the URL to reach your AI Foundry resource
const endpoint = "https://your-resource.cognitiveservices.azure.com";

// Create the VoiceLive client
const client = new VoiceLiveClient(endpoint, credential);

APIキーによる認証

開発シナリオでは、APIキーを使って認証することもできます:

import { AzureKeyCredential } from "@azure/core-auth";
import { VoiceLiveClient } from "@azure/ai-voicelive";

const endpoint = "https://your-resource.cognitiveservices.azure.com";
const credential = new AzureKeyCredential("your-api-key");

const client = new VoiceLiveClient(endpoint, credential);

例示

以下のセクションでは、Azure VoiceLiveを使った一般的な作業の一部をカバーしたコードスニペットを提供しています。ここで扱うシナリオは以下の通りです:

基本的な音声アシスタントの作成
セッションオプションの設定
リアルタイムイベントの処理
関数呼び出しの実装

基本的な音声アシスタントの作成

この例は、音声間やり取りを扱うシンプルな音声アシスタントの作成方法を示しています:

import { DefaultAzureCredential } from "@azure/identity";
import { VoiceLiveClient } from "@azure/ai-voicelive";

const credential = new DefaultAzureCredential();
const endpoint = "https://your-resource.cognitiveservices.azure.com";

// Create the client
const client = new VoiceLiveClient(endpoint, credential);

// Create and connect a session
const session = await client.startSession("gpt-4o-mini-realtime-preview");

// Configure session for voice conversation
await session.updateSession({
  modalities: ["text", "audio"],
  instructions: "You are a helpful AI assistant. Respond naturally and conversationally.",
  voice: {
    type: "azure-standard",
    name: "en-US-AvaNeural",
  },
  turnDetection: {
    type: "server_vad",
    threshold: 0.5,
    prefixPaddingMs: 300,
    silenceDurationMs: 500,
  },
  inputAudioFormat: "pcm16",
  outputAudioFormat: "pcm16",
});

セッションオプションの設定

音声のやり取りのさまざまな側面をカスタマイズできます:

import { DefaultAzureCredential } from "@azure/identity";
import { VoiceLiveClient } from "@azure/ai-voicelive";

const credential = new DefaultAzureCredential();
const endpoint = "https://your-resource.cognitiveservices.azure.com";
const client = new VoiceLiveClient(endpoint, credential);
const session = await client.startSession("gpt-4o-realtime-preview");

// Advanced session configuration
await session.updateSession({
  modalities: ["audio", "text"],
  instructions: "You are a customer service representative. Be helpful and professional.",
  voice: {
    type: "azure-custom",
    name: "your-custom-voice-name",
    endpointId: "your-custom-voice-endpoint",
  },
  turnDetection: {
    type: "server_vad",
    threshold: 0.6,
    prefixPaddingMs: 200,
    silenceDurationMs: 300,
  },
  inputAudioFormat: "pcm16",
  outputAudioFormat: "pcm16",
});

リアルタイムイベントの処理

VoiceLiveクライアントはリアルタイムのやり取りのためのイベント駆動型コミュニケーションを提供します:

import { DefaultAzureCredential } from "@azure/identity";
import { VoiceLiveClient } from "@azure/ai-voicelive";

const credential = new DefaultAzureCredential();
const endpoint = "https://your-resource.cognitiveservices.azure.com";
const client = new VoiceLiveClient(endpoint, credential);
const session = await client.startSession("gpt-4o-mini-realtime-preview");

// Set up event handlers using subscription pattern
const subscription = session.subscribe({
  onResponseAudioDelta: async (event, context) => {
    // Handle incoming audio chunks
    const audioData = event.delta;
    // Play audio using Web Audio API or other audio system
    playAudioChunk(audioData);
  },

  onResponseTextDelta: async (event, context) => {
    // Handle incoming text deltas
    console.log("Assistant:", event.delta);
  },

  onInputAudioTranscriptionCompleted: async (event, context) => {
    // Handle user speech transcription
    console.log("User said:", event.transcript);
  },
});

// Send audio data from microphone
function sendAudioChunk(audioBuffer: ArrayBuffer) {
  session.sendAudio(audioBuffer);
}

関数呼び出しの実装

音声アシスタントが外部機能やツールを呼び出すようにする:

import { DefaultAzureCredential } from "@azure/identity";
import { VoiceLiveClient } from "@azure/ai-voicelive";

const credential = new DefaultAzureCredential();
const endpoint = "https://your-resource.cognitiveservices.azure.com";
const client = new VoiceLiveClient(endpoint, credential);
const session = await client.startSession("gpt-4o-mini-realtime-preview");

// Define available functions
const tools = [
  {
    type: "function",
    name: "get_weather",
    description: "Get current weather for a location",
    parameters: {
      type: "object",
      properties: {
        location: {
          type: "string",
          description: "The city and state or country",
        },
      },
      required: ["location"],
    },
  },
];

// Configure session with tools
await session.updateSession({
  modalities: ["audio", "text"],
  instructions:
    "You can help users with weather information. Use the get_weather function when needed.",
  tools: tools,
  toolChoice: "auto",
});

// Handle function calls
const subscription = session.subscribe({
  onResponseFunctionCallArgumentsDone: async (event, context) => {
    if (event.name === "get_weather") {
      const args = JSON.parse(event.arguments);
      const weatherData = await getWeatherData(args.location);

      // Send function result back
      await session.addConversationItem({
        type: "function_call_output",
        callId: event.callId,
        output: JSON.stringify(weatherData),
      });

      // Request response generation
      await session.sendEvent({
        type: "response.create",
      });
    }
  },
});

トラブルシューティング

一般的なエラーと例外

認証エラー:認証エラーが発生した場合は、以下を確認してください:

Azure AI Foundryリソースは正しく設定されています
APIキーや認証情報に必要な権限を持っています
エンドポイントのURLは正しく、アクセス可能です

WebSocket接続の問題:VoiceLiveはWebSocket接続を使用しています。次のことを確認してください。

あなたのネットワークはWebSocket接続を許可しています
ファイアウォールルールは以下の接続を許可しています *.cognitiveservices.azure.com
ブラウザのポリシーでは、WebSocketおよびマイクへのアクセス(ブラウザ使用)が許可されています

音声の問題:音声関連の問題について:

ブラウザでマイクの権限を確認する
音声フォーマット(PCM16、PCM24)がサポートされているか確認してください
再生のために適切な音声コンテキスト設定を確実にしてください

ロギング（記録）

ログ記録を有効にすると、エラーに関する有用な情報を明らかにするのに役立つ場合があります。 WebSocketのメッセージやレスポンスのログを見るには、 AZURE_LOG_LEVEL 環境変数を infoに設定します。または、setLogLevelで @azure/logger を呼び出すことによって、実行時にログを有効にすることもできます。

import { setLogLevel } from "@azure/logger";

setLogLevel("info");

ログを有効にする方法の詳細な手順については、 @azure/logger パッケージのドキュメントを参照してください。

次のステップ

以下のリンクからさらに多くのコードサンプルを見つけることができます:

Contributing

このライブラリに投稿する場合は、コードをビルドしてテストする方法の詳細については、投稿ガイドを参照してください。

フィードバック

このページはお役に立ちましたか?

Last updated on 2025-11-18