音声とオーディオ用の GPT Realtime API

音声とオーディオ用 Azure OpenAI GPT Realtime API は GPT-4o モデルファミリの一部であり、低遅延の "音声入力、音声出力" の会話操作をサポートします。

WebRTC または WebSocket を介して Realtime API を使用して、オーディオ入力をモデルに送信し、リアルタイムでオーディオ応答を受信できます。

この記事の手順に従って、WebSocket 経由で Realtime API の使用を開始します。待機時間が短く要件ではないサーバー間シナリオでは、WebSocket 経由で Realtime API を使用します。

ヒント

ほとんどの場合、Web アプリケーションやモバイルアプリなどのクライアント側アプリケーションでリアルタイムオーディオストリーミングを行う場合は、 WebRTC 経由の Realtime API を使用することをお勧めします。 WebRTC は、待機時間が短くリアルタイムのオーディオストリーミング用に設計されており、ほとんどのユースケースに最適です。

サポートされているモデル

GPT リアルタイムモデルは、グローバルなデプロイで利用できます。

gpt-4o-realtime-preview (バージョン 2024-12-17)
gpt-4o-mini-realtime-preview (バージョン 2024-12-17)
gpt-realtime (バージョン 2025-08-28)
gpt-realtime-mini (バージョン 2025-10-06)

詳細については、モデルとバージョンのドキュメントを参照してください。

API のサポート

Realtime API のサポートは、API バージョン 2024-10-01-preview (廃止) で最初に追加されました。最新の Realtime API 機能にアクセスするには、バージョン 2025-08-28 を使用します。可能であれば、一般公開されている API バージョン ('-preview' サフィックスなし) を選択することをお勧めします。

注意事項

プレビューモデルと一般公開 (GA) モデルには 、さまざまな エンドポイント形式を使用する必要があります。この記事のすべてのサンプルでは、GA モデルと GA エンドポイント形式を使用します。パラメーター api-version 使用しないでください。これはプレビューエンドポイント形式にのみ必要です。この記事のエンドポイント形式の詳細を参照してください。

[前提条件]

Azure サブスクリプション - 無料アカウントを作成します
Node.js (LTS または ESM サポート)。
サポートされているリージョンのいずれかで作成された Azure OpenAI リソース。利用できるリージョンについて詳しくは、モデルとバージョンに関するドキュメントをご覧ください。
次に、Azure OpenAI リソースを使って gpt-realtime モデルをデプロイする必要があります。詳細については、「Azure OpenAI を使用してリソースを作成し、モデルをデプロイする」を参照してください。

Microsoft Entra ID の前提条件

Microsoft Entra ID で推奨されるキーレス認証の場合、次のことを行う必要があります。

Microsoft Entra ID でのキーレス認証に使われる Azure CLI をインストールします。
ユーザーアカウントに Cognitive Services OpenAI User ロールを割り当てます。 Azure portal の [アクセス制御 (IAM)]>[ロールの割り当ての追加] で、ロールを割り当てることができます。

リアルタイムオーディオ向けモデルのデプロイ

Microsoft Foundry ポータルで gpt-realtime モデルをデプロイするには:

Foundry ポータルに移動し、プロジェクトを作成または選択します。
モデルデプロイメントを選択してください。
1. Azure OpenAI リソースの場合は、左側のウィンドウの [共有リソース] セクションから [デプロイ] を選択します。
2. Foundry リソースの場合は、左側のウィンドウの [マイアセット] から [モデル + エンドポイント] を選択します。
[+ モデルのデプロイ] を選択>ベースモデルをデプロイしてデプロイウィンドウを開きます。
gpt-realtime モデルを検索して選択し、[確認] を選択します。
デプロイの詳細を確認し、[デプロイ] を選択 します。
ウィザードに従ってモデルのデプロイを完了します。

gpt-realtime モデルのデプロイが済んだので、Foundry ポータルの Audio playground または Realtime API で操作できます。

セットアップ

realtime-audio-quickstart-js新しいフォルダーを作成し、次のコマンドを使用してクイックスタートフォルダーに移動します。
```
mkdir realtime-audio-quickstart-js && cd realtime-audio-quickstart-js
```
次のコマンドで package.json を作成します。
```
npm init -y
```
次のコマンドを使用して、type内のmoduleをpackage.jsonに更新します。
```
npm pkg set type=module
```
次を使用して JavaScript 用の OpenAI クライアントライブラリをインストールします。
```
npm install openai
```
JavaScript 用 OpenAI クライアントライブラリで使用される依存パッケージをインストールするには、次のコマンドを使用します。
```
npm install ws
```
Microsoft Entra ID で推奨されるキーレス認証の場合、次を使って @azure/identity パッケージをインストールします。
```
npm install @azure/identity
```

リソース情報の取得

Azure OpenAI リソースでアプリケーションを認証するには、次の情報を取得する必要があります。

Microsoft Entra ID
API キー

変数名	価値
`AZURE_OPENAI_ENDPOINT`	この値は、Azure portal からリソースを調べる際のキーとエンドポイントセクションにあります。
`AZURE_OPENAI_DEPLOYMENT_NAME`	この値は、モデルのデプロイ時にデプロイに対して選択したカスタム名に対応します。この値は、Azure portal の [リソース管理]>[モデルデプロイ] にあります。

キーレス認証と環境変数の設定の詳細を参照してください。

変数名	価値
`AZURE_OPENAI_ENDPOINT`	この値は、Azure portal からリソースを調べる際のキーとエンドポイントセクションにあります。
`AZURE_OPENAI_API_KEY`	この値は、Azure portal からリソースを調べる際のキーとエンドポイントセクションにあります。 `KEY1` または `KEY2` を使用できます。
`AZURE_OPENAI_DEPLOYMENT_NAME`	この値は、モデルのデプロイ時にデプロイに対して選択したカスタム名に対応します。この値は、Azure portal の [リソース管理]>[モデルデプロイ] にあります。

API キーの確認と環境変数の設定の詳細を参照してください。

Von Bedeutung

API キーは慎重に使用してください。 API キーは、コード内に直接含めないようにし、絶対に公開しないでください。 API キーを使用する場合は、Azure Key Vault に安全に保存します。アプリで API キーを安全に使用する方法の詳細については、Azure Key Vault を使用した API キーに関する記事を参照してください。

AI サービスのセキュリティの詳細については、「 Azure AI サービスへの要求を認証する」を参照してください。

注意事項

SDK で推奨されるキーレス認証を使用するには、AZURE_OPENAI_API_KEY 環境変数が設定されていないことを確認します。

次のコードを使用して index.js ファイルを作成します。

import OpenAI from 'openai';
import { OpenAIRealtimeWS } from 'openai/realtime/ws';
import { DefaultAzureCredential, getBearerTokenProvider } from '@azure/identity';
import { OpenAIRealtimeError } from 'openai/realtime/internal-base';

let isCreated = false;
let isConfigured = false;
let responseDone = false;

// Set this to false, if you want to continue receiving events after an error is received.
const throwOnError = true;

async function main() {
    // The endpoint of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_ENDPOINT
    // environment variable or replace the default value below.
    // You can find it in the Microsoft Foundry portal in the Overview page of your Azure OpenAI resource.
    // Example: https://{your-resource}.openai.azure.com
    const endpoint = process.env.AZURE_OPENAI_ENDPOINT || 'AZURE_OPENAI_ENDPOINT';
    const baseUrl = endpoint.replace(/\/$/, "") + '/openai/v1';

    // The deployment name of your Azure OpenAI model is required. You can set it in the AZURE_OPENAI_DEPLOYMENT_NAME
    // environment variable or replace the default value below.
    // You can find it in the Foundry portal in the "Models + endpoints" page of your Azure OpenAI resource.
    // Example: gpt-realtime
    const deploymentName = process.env.AZURE_OPENAI_DEPLOYMENT_NAME || 'gpt-realtime';

    // Keyless authentication
    const credential = new DefaultAzureCredential();
    const scope = 'https://cognitiveservices.azure.com/.default';
    const azureADTokenProvider = getBearerTokenProvider(credential, scope);
    const token = await azureADTokenProvider();

    // The APIs are compatible with the OpenAI client library.
    // You can use the OpenAI client library to access the Azure OpenAI APIs.
    // Make sure to set the baseURL and apiKey to use the Azure OpenAI endpoint and token.
    const openAIClient = new OpenAI({
        baseURL: baseUrl,
        apiKey: token,
    });
    const realtimeClient = await OpenAIRealtimeWS.create(openAIClient, {
        model: deploymentName
    });

    realtimeClient.on('error', (receivedError) => receiveError(receivedError));
    realtimeClient.on('session.created', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('session.updated', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('response.output_audio.delta', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('response.output_audio_transcript.delta', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('response.done', (receivedEvent) => receiveEvent(receivedEvent));

    console.log('Waiting for events...');
    while (!isCreated) {
        console.log('Waiting for session.created event...');
        await new Promise((resolve) => setTimeout(resolve, 100));
    }

    // After the session is created, configure it to enable audio input and output.
    const sessionConfig = {
        'type': 'realtime',
        'instructions': 'You are a helpful assistant. You respond by voice and text.',
        'output_modalities': ['audio'],
        'audio': {
            'input': {
                'transcription': {
                    'model': 'whisper-1'
                },
                'format': {
                    'type': 'audio/pcm',
                    'rate': 24000,
                },
                'turn_detection': {
                    'type': 'server_vad',
                    'threshold': 0.5,
                    'prefix_padding_ms': 300,
                    'silence_duration_ms': 200,
                    'create_response': true
                }
            },
            'output': {
                'voice': 'alloy',
                'format': {
                    'type': 'audio/pcm',
                    'rate': 24000,
                }
            }
        }
    };

    realtimeClient.send({
        'type': 'session.update',
        'session': sessionConfig
    });
    while (!isConfigured) {
        console.log('Waiting for session.updated event...');
        await new Promise((resolve) => setTimeout(resolve, 100));
    }

    // After the session is configured, data can be sent to the session.    
    realtimeClient.send({
        'type': 'conversation.item.create',
        'item': {
            'type': 'message',
            'role': 'user',
            'content': [{
                type: 'input_text',
                text: 'Please assist the user.'
            }
            ]
        }
    });

    realtimeClient.send({
        type: 'response.create'
    });



    // While waiting for the session to finish, the events can be handled in the event handlers.
    // In this example, we just wait for the first response.done event.
    while (!responseDone) {
        console.log('Waiting for response.done event...');
        await new Promise((resolve) => setTimeout(resolve, 100));
    }

    console.log('The sample completed successfully.');
    realtimeClient.close();
}

function receiveError(err) {
    if (err instanceof OpenAIRealtimeError) {
        console.error('Received an error event.');
        console.error(`Message: ${err.cause.message}`);
        console.error(`Stack: ${err.cause.stack}`);
    }

    if (throwOnError) {
        throw err;
    }
}

function receiveEvent(event) {
    console.log(`Received an event: ${event.type}`);

    switch (event.type) {
        case 'session.created':
            console.log(`Session ID: ${event.session.id}`);
            isCreated = true;
            break;
        case 'session.updated':
            console.log(`Session ID: ${event.session.id}`);
            isConfigured = true;
            break;
        case 'response.output_audio_transcript.delta':
            console.log(`Transcript delta: ${event.delta}`);
            break;
        case 'response.output_audio.delta':
            let audioBuffer = Buffer.from(event.delta, 'base64');
            console.log(`Audio delta length: ${audioBuffer.length} bytes`);
            break;
        case 'response.done':
            console.log(`Response ID: ${event.response.id}`);
            console.log(`The final response is: ${event.response.output[0].content[0].transcript}`);
            responseDone = true;
            break;
        default:
            console.warn(`Unhandled event type: ${event.type}`);
    }
}

main().catch((err) => {
    console.error('The sample encountered an error:', err);
});
export {
    main
};

次のコマンドを使用して Azure にサインインします。
```
az login
```
JavaScript ファイルを実行します。
```
node index.js
```

次のコードを使用して index.js ファイルを作成します。

import OpenAI from 'openai';
import { OpenAIRealtimeWS } from 'openai/realtime/ws';
import { OpenAIRealtimeError } from 'openai/realtime/internal-base';

let isCreated = false;
let isConfigured = false;
let responseDone = false;

// Set this to false, if you want to continue receiving events after an error is received.
const throwOnError = true;

async function main() {
    // The endpoint of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_ENDPOINT
    // environment variable or replace the default value below.
    // You can find it in the Foundry portal in the Overview page of your Azure OpenAI resource.
    // Example: https://{your-resource}.openai.azure.com
    const endpoint = process.env.AZURE_OPENAI_ENDPOINT || 'AZURE_OPENAI_ENDPOINT';
    const baseUrl = endpoint.replace(/\/$/, "") + '/openai/v1';

    // The deployment name of your Azure OpenAI model is required. You can set it in the AZURE_OPENAI_DEPLOYMENT_NAME
    // environment variable or replace the default value below.
    // You can find it in the Foundry portal in the "Models + endpoints" page of your Azure OpenAI resource.
    // Example: gpt-realtime
    const deploymentName = process.env.AZURE_OPENAI_DEPLOYMENT_NAME || 'gpt-realtime';

    // API Key of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_API_KEY
    // environment variable or replace the default value below.
    // You can find it in the Foundry portal in the Overview page of your Azure OpenAI resource.
    const token = process.env.AZURE_OPENAI_API_KEY || '<Your API Key>';

    // The APIs are compatible with the OpenAI client library.
    // You can use the OpenAI client library to access the Azure OpenAI APIs.
    // Make sure to set the baseURL and apiKey to use the Azure OpenAI endpoint and token.
    const openAIClient = new OpenAI({
        baseURL: baseUrl,
        apiKey: token,
    });

    // Due to the current SDK limitation we need to explicitly
    // pass API key as Header
    const realtimeClient = await OpenAIRealtimeWS.create(
        openAIClient, {
        model: deploymentName,
        options: {
            headers: {
                "api-key": token
            }
        }
    });

    realtimeClient.on('error', (receivedError) => receiveError(receivedError));
    realtimeClient.on('session.created', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('session.updated', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('response.output_audio.delta', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('response.output_audio_transcript.delta', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('response.done', (receivedEvent) => receiveEvent(receivedEvent));

    console.log('Waiting for events...');
    while (!isCreated) {
        console.log('Waiting for session.created event...');
        await new Promise((resolve) => setTimeout(resolve, 100));
    }

    // After the session is created, configure it to enable audio input and output.
    const sessionConfig = {
        'type': 'realtime',
        'instructions': 'You are a helpful assistant. You respond by voice and text.',
        'output_modalities': ['audio'],
        'audio': {
            'input': {
                'transcription': {
                    'model': 'whisper-1'
                },
                'format': {
                    'type': 'audio/pcm',
                    'rate': 24000,
                },
                'turn_detection': {
                    'type': 'server_vad',
                    'threshold': 0.5,
                    'prefix_padding_ms': 300,
                    'silence_duration_ms': 200,
                    'create_response': true
                }
            },
            'output': {
                'voice': 'alloy',
                'format': {
                    'type': 'audio/pcm',
                    'rate': 24000,
                }
            }
        }
    };

    realtimeClient.send({
        'type': 'session.update',
        'session': sessionConfig
    });
    while (!isConfigured) {
        console.log('Waiting for session.updated event...');
        await new Promise((resolve) => setTimeout(resolve, 100));
    }

    // After the session is configured, data can be sent to the session.
    realtimeClient.send({
        'type': 'conversation.item.create',
        'item': {
            'type': 'message',
            'role': 'user',
            'content': [{
                type: 'input_text',
                text: 'Please assist the user.'
            }
            ]
        }
    });

    realtimeClient.send({
        type: 'response.create'
    });

    // While waiting for the session to finish, the events can be handled in the event handlers.
    // In this example, we just wait for the first response.done event.
    while (!responseDone) {
        console.log('Waiting for response.done event...');
        await new Promise((resolve) => setTimeout(resolve, 100));
    }

    console.log('The sample completed successfully.');
    realtimeClient.close();
}

function receiveError(err) {
    if (err instanceof OpenAIRealtimeError) {
        console.error('Received an error event.');
        console.error(`Message: ${err.cause.message}`);
        console.error(`Stack: ${err.cause.stack}`);
    }

    if (throwOnError) {
        throw err;
    }
}

function receiveEvent(event) {
    console.log(`Received an event: ${event.type}`);

    switch (event.type) {
        case 'session.created':
            console.log(`Session ID: ${event.session.id}`);
            isCreated = true;
            break;
        case 'session.updated':
            console.log(`Session ID: ${event.session.id}`);
            isConfigured = true;
            break;
        case 'response.output_audio_transcript.delta':
            console.log(`Transcript delta: ${event.delta}`);
            break;
        case 'response.output_audio.delta':
            let audioBuffer = Buffer.from(event.delta, 'base64');
            console.log(`Audio delta length: ${audioBuffer.length} bytes`);
            break;
        case 'response.done':
            console.log(`Response ID: ${event.response.id}`);
            console.log(`The final response is: ${event.response.output[0].content[0].transcript}`);
            responseDone = true;
            break;
        default:
            console.warn(`Unhandled event type: ${event.type}`);
    }
}

main().catch((err) => {
    console.error('The sample encountered an error:', err);
});
export {
    main
};

JavaScript ファイルを実行します。
```
node index.js
```

応答が返されるまで少し時間がかかります。

アウトプット

このスクリプトは、モデルから応答を取得して、受け取ったトランスクリプトとオーディオデータを出力します。

出力は次のようになります。

Waiting for events...
Waiting for session.created event...
Received an event: session.created
Session ID: sess_CQx8YO3vKxD9FaPxrbQ9R
Waiting for session.updated event...
Received an event: session.updated
Session ID: sess_CQx8YO3vKxD9FaPxrbQ9R
Waiting for response.done event...
Waiting for response.done event...
Waiting for response.done event...
Received an event: response.output_audio_transcript.delta
Transcript delta: Sure
Received an event: response.output_audio_transcript.delta
Transcript delta: ,
Received an event: response.output_audio_transcript.delta
Transcript delta:  I
Waiting for response.done event...
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 4800 bytes
Received an event: response.output_audio.delta
Audio delta length: 7200 bytes
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio_transcript.delta
Transcript delta: 'm
Received an event: response.output_audio_transcript.delta
Transcript delta:  here
Received an event: response.output_audio_transcript.delta
Transcript delta:  to
Received an event: response.output_audio_transcript.delta
Transcript delta:  help
Received an event: response.output_audio_transcript.delta
Transcript delta: .
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio_transcript.delta
Transcript delta:  What
Received an event: response.output_audio_transcript.delta
Transcript delta:  do
Received an event: response.output_audio_transcript.delta
Transcript delta:  you
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio_transcript.delta
Transcript delta:  need
Received an event: response.output_audio_transcript.delta
Transcript delta:  assistance
Received an event: response.output_audio_transcript.delta
Transcript delta:  with
Received an event: response.output_audio_transcript.delta
Transcript delta: ?
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 28800 bytes
Received an event: response.done
Response ID: resp_CQx8YwQCszDqSUXRutxP9
The final response is: Sure, I'm here to help. What do you need assistance with?
The sample completed successfully.

[前提条件]

Azure サブスクリプション。無料で作成できます。
Python 3.8 以降のバージョン。 Python 3.10 以降を使用することをお勧めしますが、少なくとも Python 3.8 が必要です。適切なバージョンの Python がインストールされていない場合は、オペレーティングシステムへの Python のインストールの最も簡単な方法として、VS Code Python チュートリアルの手順に従うことができます。
サポートされているリージョンのいずれかで作成された Azure OpenAI リソース。利用できるリージョンについて詳しくは、モデルとバージョンに関するドキュメントをご覧ください。
次に、Azure OpenAI リソースを使用して gpt-realtime または gpt-realtime-mini モデルをデプロイする必要があります。詳細については、「Azure OpenAI を使用してリソースを作成し、モデルをデプロイする」を参照してください。

Microsoft Entra ID の前提条件

Microsoft Entra ID で推奨されるキーレス認証の場合、次のことを行う必要があります。

Microsoft Entra ID でのキーレス認証に使われる Azure CLI をインストールします。
ユーザーアカウントに Cognitive Services OpenAI User ロールを割り当てます。 Azure portal の [アクセス制御 (IAM)]>[ロールの割り当ての追加] で、ロールを割り当てることができます。

リアルタイムオーディオ向けモデルのデプロイ

Microsoft Foundry ポータルで gpt-realtime モデルをデプロイするには:

Foundry ポータルに移動し、プロジェクトを作成または選択します。
モデルデプロイメントを選択してください。
1. Azure OpenAI リソースの場合は、左側のウィンドウの [共有リソース] セクションから [デプロイ] を選択します。
2. Foundry リソースの場合は、左側のウィンドウの [マイアセット] から [モデル + エンドポイント] を選択します。
[+ モデルのデプロイ] を選択>ベースモデルをデプロイしてデプロイウィンドウを開きます。
gpt-realtime モデルを検索して選択し、[確認] を選択します。
デプロイの詳細を確認し、[デプロイ] を選択 します。
ウィザードに従ってモデルのデプロイを完了します。

gpt-realtime モデルのデプロイが済んだので、Foundry ポータルの Audio playground または Realtime API で操作できます。

セットアップ

realtime-audio-quickstart-py新しいフォルダーを作成し、次のコマンドを使用してクイックスタートフォルダーに移動します。
```
mkdir realtime-audio-quickstart-py && cd realtime-audio-quickstart-py
```
仮想環境を作成します。 Python 3.10 以降が既にインストールされている場合は、次のコマンドを使用して仮想環境を作成できます:
```
py -3 -m venv .venv
.venv\scripts\activate
```
```
python3 -m venv .venv
source .venv/bin/activate
```
```
python3 -m venv .venv
source .venv/bin/activate
```
Python 環境をアクティブ化するということは、コマンドラインから python または pip を実行する際に、アプリケーションの .venv フォルダーに含まれている Python インタープリターを使用するということを意味します。 deactivate コマンドを使用して Python 仮想環境を終了し、必要に応じて、それを後で再アクティブ化できます。

ヒント

新しい Python 環境を作成してアクティブにし、このチュートリアルに必要なパッケージのインストールに使うことをお勧めします。グローバルな Python インストールにパッケージをインストールしないでください。 Python パッケージをインストールするときは、常に仮想または Conda 環境を使う必要があります。そうしないと、Python のグローバルインストールが損なわれる可能性があります。
次を使用して、OpenAI Python クライアントライブラリをインストールします。
```
pip install openai[realtime]
```
注

このライブラリは、OpenAI によって保持されます。このライブラリの最新の更新を追跡するには、リリース履歴を参照してください。
Microsoft Entra ID で推奨されるキーレス認証の場合、次を使って azure-identity パッケージをインストールします。
```
pip install azure-identity
```

リソース情報の取得

Azure OpenAI リソースでアプリケーションを認証するには、次の情報を取得する必要があります。

Microsoft Entra ID
API キー

変数名	価値
`AZURE_OPENAI_ENDPOINT`	この値は、Azure portal からリソースを調べる際のキーとエンドポイントセクションにあります。
`AZURE_OPENAI_DEPLOYMENT_NAME`	この値は、モデルのデプロイ時にデプロイに対して選択したカスタム名に対応します。この値は、Azure portal の [リソース管理]>[モデルデプロイ] にあります。

キーレス認証と環境変数の設定の詳細を参照してください。

変数名	価値
`AZURE_OPENAI_ENDPOINT`	この値は、Azure portal からリソースを調べる際のキーとエンドポイントセクションにあります。
`AZURE_OPENAI_API_KEY`	この値は、Azure portal からリソースを調べる際のキーとエンドポイントセクションにあります。 `KEY1` または `KEY2` を使用できます。
`AZURE_OPENAI_DEPLOYMENT_NAME`	この値は、モデルのデプロイ時にデプロイに対して選択したカスタム名に対応します。この値は、Azure portal の [リソース管理]>[モデルデプロイ] にあります。

API キーの確認と環境変数の設定の詳細を参照してください。

Von Bedeutung

AI サービスのセキュリティの詳細については、「 Azure AI サービスへの要求を認証する」を参照してください。

注意事項

SDK で推奨されるキーレス認証を使用するには、AZURE_OPENAI_API_KEY 環境変数が設定されていないことを確認します。

オーディオ出力のテキスト

Microsoft Entra ID
API キー

次のコードを使用して text-in-audio-out.py ファイルを作成します。

import os
import base64
import asyncio
from openai import AsyncOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

async def main() -> None:
    """
    When prompted for user input, type a message and hit enter to send it to the model.
    Enter "q" to quit the conversation.
    """

    credential = DefaultAzureCredential()
    token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")
    token = token_provider()

    # The endpoint of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_ENDPOINT
    # environment variable.
    # You can find it in the Microsoft Foundry portal in the Overview page of your Azure OpenAI resource.
    # Example: https://{your-resource}.openai.azure.com
    endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]

    # The deployment name of the model you want to use is required. You can set it in the AZURE_OPENAI_DEPLOYMENT_NAME
    # environment variable.
    # You can find it in the Foundry portal in the "Models + endpoints" page of your Azure OpenAI resource.
    # Example: gpt-realtime
    deployment_name = os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"]

    base_url = endpoint.replace("https://", "wss://").rstrip("/") + "/openai/v1"

    # The APIs are compatible with the OpenAI client library.
    # You can use the OpenAI client library to access the Azure OpenAI APIs.
    # Make sure to set the baseURL and apiKey to use the Azure OpenAI endpoint and token.
    client = AsyncOpenAI(
        websocket_base_url=base_url,
        api_key=token
    )
    async with client.realtime.connect(
        model=deployment_name,
    ) as connection:
        # after the connection is created, configure the session.
        await connection.session.update(session={
            "type": "realtime",
            "instructions": "You are a helpful assistant. You respond by voice and text.",
            "output_modalities": ["audio"],
            "audio": {
                "input": {
                    "transcription": {
                        "model": "whisper-1",
                    },
                    "format": {
                        "type": "audio/pcm",
                        "rate": 24000,
                    },
                    "turn_detection": {
                        "type": "server_vad",
                        "threshold": 0.5,
                        "prefix_padding_ms": 300,
                        "silence_duration_ms": 200,
                        "create_response": True,
                    }
                },
                "output": {
                    "voice": "alloy",
                    "format": {
                        "type": "audio/pcm",
                        "rate": 24000,
                    }
                }
            }
        })

        # After the session is configured, data can be sent to the session.
        while True:
            user_input = input("Enter a message: ")
            if user_input == "q":
                print("Stopping the conversation.")
                break

            await connection.conversation.item.create(
                item={
                    "type": "message",
                    "role": "user",
                    "content": [{"type": "input_text", "text": user_input}],
                }
            )
            await connection.response.create()
            async for event in connection:
                if event.type == "response.output_text.delta":
                    print(event.delta, flush=True, end="")
                elif event.type == "session.created":
                    print(f"Session ID: {event.session.id}")
                elif event.type == "response.output_audio.delta":
                    audio_data = base64.b64decode(event.delta)
                    print(f"Received {len(audio_data)} bytes of audio data.")
                elif event.type == "response.output_audio_transcript.delta":
                    print(f"Received text delta: {event.delta}")
                elif event.type == "response.output_text.done":
                    print()
                elif event.type == "error":
                    print("Received an error event.")
                    print(f"Error code: {event.error.code}")
                    print(f"Error Event ID: {event.error.event_id}")
                    print(f"Error message: {event.error.message}")
                elif event.type == "response.done":
                    break

    print("Conversation ended.")
    credential.close()

asyncio.run(main())

次のコマンドを使用して Azure にサインインします。
```
az login
```
Python ファイルを実行します。
```
python text-in-audio-out.py
```
ユーザー入力を求められたら、メッセージを入力し、Enter キーを押してモデルに送信します。「q」と入力して会話を終了します。

次のコードを使用して text-in-audio-out.py ファイルを作成します。

import os
import base64
import asyncio
from openai import AsyncOpenAI

async def main() -> None:
    """
    When prompted for user input, type a message and hit enter to send it to the model.
    Enter "q" to quit the conversation.
    """

    # The endpoint of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_ENDPOINT
    # environment variable.
    # You can find it in the Foundry portal in the Overview page of your Azure OpenAI resource.
    # Example: https://{your-resource}.openai.azure.com
    endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
    base_url = endpoint.replace("https://", "wss://").rstrip("/") + "/openai/v1"

    # The deployment name of the model you want to use is required. You can set it in the AZURE_OPENAI_DEPLOYMENT_NAME
    # environment variable.
    # You can find it in the Foundry portal in the "Models + endpoints" page of your Azure OpenAI resource.
    # Example: gpt-realtime
    deployment_name = os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"]

    # API Key of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_API_KEY
    # environment variable.
    # You can find it in the Foundry portal in the Overview page of your Azure OpenAI resource.
    token=os.environ["AZURE_OPENAI_API_KEY"]

    # The APIs are compatible with the OpenAI client library.
    # You can use the OpenAI client library to access the Azure OpenAI APIs.
    # Make sure to set the baseURL and apiKey to use the Azure OpenAI endpoint and token.
    client = AsyncOpenAI(
        websocket_base_url=base_url,
        api_key=token
    )
    async with client.realtime.connect(
        model=deployment_name,
    ) as connection:
        # after the connection is created, configure the session.
        await connection.session.update(session={
            "type": "realtime",
            "instructions": "You are a helpful assistant. You respond by voice and text.",
            "output_modalities": ["audio"],
            "audio": {
                "input": {
                    "transcription": {
                        "model": "whisper-1",
                    },
                    "format": {
                        "type": "audio/pcm",
                        "rate": 24000,
                    },
                    "turn_detection": {
                        "type": "server_vad",
                        "threshold": 0.5,
                        "prefix_padding_ms": 300,
                        "silence_duration_ms": 200,
                        "create_response": True,
                    }
                },
                "output": {
                    "voice": "alloy",
                    "format": {
                        "type": "audio/pcm",
                        "rate": 24000,
                    }
                }
            }
        })

        # After the session is configured, data can be sent to the session.
        while True:
            user_input = input("Enter a message: ")
            if user_input == "q":
                print("Stopping the conversation.")
                break

            await connection.conversation.item.create(
                item={
                    "type": "message",
                    "role": "user",
                    "content": [{"type": "input_text", "text": user_input}],
                }
            )
            await connection.response.create()
            async for event in connection:
                if event.type == "response.output_text.delta":
                    print(event.delta, flush=True, end="")
                elif event.type == "session.created":
                    print(f"Session ID: {event.session.id}")
                elif event.type == "response.output_audio.delta":
                    audio_data = base64.b64decode(event.delta)
                    print(f"Received {len(audio_data)} bytes of audio data.")
                elif event.type == "response.output_audio_transcript.delta":
                    print(f"Received text delta: {event.delta}")
                elif event.type == "response.output_text.done":
                    print()
                elif event.type == "error":
                    print("Received an error event.")
                    print(f"Error code: {event.error.code}")
                    print(f"Error Event ID: {event.error.event_id}")
                    print(f"Error message: {event.error.message}")
                elif event.type == "response.done":
                    break

    print("Conversation ended.")

asyncio.run(main())

Python ファイルを実行します。
```
python text-in-audio-out.py
```
ユーザー入力を求められたら、メッセージを入力し、Enter キーを押してモデルに送信します。「q」と入力して会話を終了します。

応答が返されるまで少し時間がかかります。

アウトプット

このスクリプトは、モデルから応答を取得して、受け取ったトランスクリプトとオーディオデータを出力します。

出力は、次のようになります。

Enter a message: How are you today?
Session ID: sess_CgAuonaqdlSNNDTdqBagI
Received text delta: I'm
Received text delta:  doing
Received text delta:  well
Received text delta: ,
Received 4800 bytes of audio data.
Received 7200 bytes of audio data.
Received 12000 bytes of audio data.
Received text delta:  thank
Received text delta:  you
Received text delta:  for
Received text delta:  asking
Received text delta: !
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received text delta:  How
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received text delta:  about
Received text delta:  you
Received text delta: —
Received text delta: how
Received text delta:  are
Received text delta:  you
Received text delta:  feeling
Received text delta:  today
Received text delta: ?
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 12000 bytes of audio data.
Received 24000 bytes of audio data.
Enter a message: q
Stopping the conversation.
Conversation ended.

[前提条件]

Azure サブスクリプション - 無料アカウントを作成します
Node.js (LTS または ESM サポート)。
グローバルにインストールされた TypeScript。
サポートされているリージョンのいずれかで作成された Azure OpenAI リソース。利用できるリージョンについて詳しくは、モデルとバージョンに関するドキュメントをご覧ください。
次に、Azure OpenAI リソースを使って gpt-realtime モデルをデプロイする必要があります。詳細については、「Azure OpenAI を使用してリソースを作成し、モデルをデプロイする」を参照してください。

Microsoft Entra ID の前提条件

Microsoft Entra ID で推奨されるキーレス認証の場合、次のことを行う必要があります。

Microsoft Entra ID でのキーレス認証に使われる Azure CLI をインストールします。
ユーザーアカウントに Cognitive Services OpenAI User ロールを割り当てます。 Azure portal の [アクセス制御 (IAM)]>[ロールの割り当ての追加] で、ロールを割り当てることができます。

リアルタイムオーディオ向けモデルのデプロイ

Microsoft Foundry ポータルで gpt-realtime モデルをデプロイするには:

Foundry ポータルに移動し、プロジェクトを作成または選択します。
モデルデプロイメントを選択してください。
1. Azure OpenAI リソースの場合は、左側のウィンドウの [共有リソース] セクションから [デプロイ] を選択します。
2. Foundry リソースの場合は、左側のウィンドウの [マイアセット] から [モデル + エンドポイント] を選択します。
[+ モデルのデプロイ] を選択>ベースモデルをデプロイしてデプロイウィンドウを開きます。
gpt-realtime モデルを検索して選択し、[確認] を選択します。
デプロイの詳細を確認し、[デプロイ] を選択 します。
ウィザードに従ってモデルのデプロイを完了します。

gpt-realtime モデルのデプロイが済んだので、Foundry ポータルの Audio playground または Realtime API で操作できます。

セットアップ

realtime-audio-quickstart-ts新しいフォルダーを作成し、次のコマンドを使用してクイックスタートフォルダーに移動します。
```
mkdir realtime-audio-quickstart-ts && cd realtime-audio-quickstart-ts
```
次のコマンドで package.json を作成します。
```
npm init -y
```
次のコマンドを使用して、package.json を ECMAScript に更新します。
```
npm pkg set type=module
```
次を使用して JavaScript 用の OpenAI クライアントライブラリをインストールします。
```
npm install openai
```
JavaScript 用 OpenAI クライアントライブラリで使用される依存パッケージをインストールするには、次のコマンドを使用します。
```
npm install ws
```
Microsoft Entra ID で推奨されるキーレス認証の場合、次を使って @azure/identity パッケージをインストールします。
```
npm install @azure/identity
```

リソース情報の取得

Azure OpenAI リソースでアプリケーションを認証するには、次の情報を取得する必要があります。

Microsoft Entra ID
API キー

変数名	価値
`AZURE_OPENAI_ENDPOINT`	この値は、Azure portal からリソースを調べる際のキーとエンドポイントセクションにあります。
`AZURE_OPENAI_DEPLOYMENT_NAME`	この値は、モデルのデプロイ時にデプロイに対して選択したカスタム名に対応します。この値は、Azure portal の [リソース管理]>[モデルデプロイ] にあります。

キーレス認証と環境変数の設定の詳細を参照してください。

変数名	価値
`AZURE_OPENAI_ENDPOINT`	この値は、Azure portal からリソースを調べる際のキーとエンドポイントセクションにあります。
`AZURE_OPENAI_API_KEY`	この値は、Azure portal からリソースを調べる際のキーとエンドポイントセクションにあります。 `KEY1` または `KEY2` を使用できます。
`AZURE_OPENAI_DEPLOYMENT_NAME`	この値は、モデルのデプロイ時にデプロイに対して選択したカスタム名に対応します。この値は、Azure portal の [リソース管理]>[モデルデプロイ] にあります。

API キーの確認と環境変数の設定の詳細を参照してください。

Von Bedeutung

AI サービスのセキュリティの詳細については、「 Azure AI サービスへの要求を認証する」を参照してください。

注意事項

SDK で推奨されるキーレス認証を使用するには、AZURE_OPENAI_API_KEY 環境変数が設定されていないことを確認します。

オーディオ出力のテキスト

Microsoft Entra ID
API キー

次のコードを使用して index.ts ファイルを作成します。

import OpenAI from 'openai';
import { OpenAIRealtimeWS } from 'openai/realtime/ws';
import { OpenAIRealtimeError } from 'openai/realtime/internal-base';
import { DefaultAzureCredential, getBearerTokenProvider } from "@azure/identity";
import { RealtimeSessionCreateRequest } from 'openai/resources/realtime/realtime';

let isCreated = false;
let isConfigured = false;
let responseDone = false;

// Set this to false, if you want to continue receiving events after an error is received.
const throwOnError = true;

async function main(): Promise<void> {
    // The endpoint of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_ENDPOINT
    // environment variable or replace the default value below.
    // You can find it in the Microsoft Foundry portal in the Overview page of your Azure OpenAI resource.
    // Example: https://{your-resource}.openai.azure.com
    const endpoint = process.env.AZURE_OPENAI_ENDPOINT || 'AZURE_OPENAI_ENDPOINT';
    const baseUrl = endpoint.replace(/\/$/, "") + '/openai/v1';

    // The deployment name of your Azure OpenAI model is required. You can set it in the AZURE_OPENAI_DEPLOYMENT_NAME
    // environment variable or replace the default value below.
    // You can find it in the Foundry portal in the "Models + endpoints" page of your Azure OpenAI resource.
    // Example: gpt-realtime
    const deploymentName = process.env.AZURE_OPENAI_DEPLOYMENT_NAME || 'gpt-realtime';

    // Keyless authentication
    const credential = new DefaultAzureCredential();
    const scope = "https://cognitiveservices.azure.com/.default";
    const azureADTokenProvider = getBearerTokenProvider(credential, scope);
    const token = await azureADTokenProvider();

    // The APIs are compatible with the OpenAI client library.
    // You can use the OpenAI client library to access the Azure OpenAI APIs.
    // Make sure to set the baseURL and apiKey to use the Azure OpenAI endpoint and token.
    const openAIClient = new OpenAI({
        baseURL: baseUrl,
        apiKey: token,
    });
    const realtimeClient = await OpenAIRealtimeWS.create(openAIClient, { model: deploymentName });

    realtimeClient.on('error', (receivedError) => receiveError(receivedError));
    realtimeClient.on('session.created', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('session.updated', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('response.output_audio.delta', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('response.output_audio_transcript.delta', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('response.done', (receivedEvent) => receiveEvent(receivedEvent));

    console.log('Waiting for events...');
    while (!isCreated) {
        console.log('Waiting for session.created event...');
        await new Promise((resolve) => setTimeout(resolve, 100));
    }

    // After the session is created, configure it to enable audio input and output.
    const sessionConfig: RealtimeSessionCreateRequest = {
        'type': 'realtime',
        'instructions': 'You are a helpful assistant. You respond by voice and text.',
        'output_modalities': ['audio'],
        'audio': {
            'input': {
                'transcription': {
                    'model': 'whisper-1'
                },
                'format': {
                    'type': 'audio/pcm',
                    'rate': 24000,
                },
                'turn_detection': {
                    'type': 'server_vad',
                    'threshold': 0.5,
                    'prefix_padding_ms': 300,
                    'silence_duration_ms': 200,
                    'create_response': true
                }
            },
            'output': {
                'voice': 'alloy',
                'format': {
                    'type': 'audio/pcm',
                    'rate': 24000,
                }
            }
        }
    };

    realtimeClient.send({ 'type': 'session.update', 'session': sessionConfig });

    while (!isConfigured) {
        console.log('Waiting for session.updated event...');
        await new Promise((resolve) => setTimeout(resolve, 100));
    }

    // After the session is configured, data can be sent to the session.
    realtimeClient.send({
        'type': 'conversation.item.create',
        'item': {
            'type': 'message',
            'role': 'user',
            'content': [{ type: 'input_text', text: 'Please assist the user.' }]
        }
    });

    realtimeClient.send({ type: 'response.create' });

    // While waiting for the session to finish, the events can be handled in the event handlers.
    // In this example, we just wait for the first response.done event. 
    while (!responseDone) {
        console.log('Waiting for response.done event...');
        await new Promise((resolve) => setTimeout(resolve, 100));
    }

    console.log('The sample completed successfully.');
    realtimeClient.close();
}

function receiveError(errorEvent: OpenAIRealtimeError): void {
    if (errorEvent instanceof OpenAIRealtimeError) {
        console.error('Received an error event.');
        console.error(`Message: ${errorEvent.message}`);
        console.error(`Stack: ${errorEvent.stack}`); errorEvent
    }

    if (throwOnError) {
        throw errorEvent;
    }
}

function receiveEvent(event: any): void {
    console.log(`Received an event: ${event.type}`);

    switch (event.type) {
        case 'session.created':
            console.log(`Session ID: ${event.session.id}`);
            isCreated = true;
            break;
        case 'session.updated':
            console.log(`Session ID: ${event.session.id}`);
            isConfigured = true;
            break;
        case 'response.output_audio_transcript.delta':
            console.log(`Transcript delta: ${event.delta}`);
            break;
        case 'response.output_audio.delta':
            let audioBuffer = Buffer.from(event.delta, 'base64');
            console.log(`Audio delta length: ${audioBuffer.length} bytes`);
            break;
        case 'response.done':
            console.log(`Response ID: ${event.response.id}`);
            console.log(`The final response is: ${event.response.output[0].content[0].transcript}`);
            responseDone = true;
            break;
        default:
            console.warn(`Unhandled event type: ${event.type}`);
    }
}

main().catch((err) => {
    console.error("The sample encountered an error:", err);
});

export { main };

TypeScript コードをトランスパイルするために tsconfig.json ファイルを作成して、ECMAScript 向けの次のコードをコピーします。

{
    "compilerOptions": {
      "module": "NodeNext",
      "target": "ES2022", // Supports top-level await
      "moduleResolution": "NodeNext",
      "skipLibCheck": true, // Avoid type errors from node_modules
      "strict": true // Enable strict type-checking options
    },
    "include": ["*.ts"]
}

Node の型定義をインストールする
```
npm i --save-dev @types/node
```
TypeScript から JavaScript にトランスパイルします。
```
tsc
```
次のコマンドを使用して Azure にサインインします。
```
az login
```
次のコマンドを使用して、コードを実行します。
```
node index.js
```

次のコードを使用して index.ts ファイルを作成します。

import OpenAI from 'openai';
import { OpenAIRealtimeWS } from 'openai/realtime/ws';
import { OpenAIRealtimeError } from 'openai/realtime/internal-base';
import { RealtimeSessionCreateRequest } from 'openai/resources/realtime/realtime';

let isCreated = false;
let isConfigured = false;
let responseDone = false;

// Set this to false, if you want to continue receiving events after an error is received.
const throwOnError = true;

async function main(): Promise<void> {
    // The endpoint of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_ENDPOINT
    // environment variable or replace the default value below.
    // You can find it in the Foundry portal in the Overview page of your Azure OpenAI resource.
    // Example: https://{your-resource}.openai.azure.com
    const endpoint = process.env.AZURE_OPENAI_ENDPOINT || 'AZURE_OPENAI_ENDPOINT';
    const baseUrl = endpoint.replace(/\/$/, "") + '/openai/v1';

    // The deployment name of your Azure OpenAI model is required. You can set it in the AZURE_OPENAI_DEPLOYMENT_NAME
    // environment variable or replace the default value below.
    // You can find it in the Foundry portal in the "Models + endpoints" page of your Azure OpenAI resource.
    // Example: gpt-realtime
    const deploymentName = process.env.AZURE_OPENAI_DEPLOYMENT_NAME || 'gpt-realtime';

    // API Key of your Azure OpenAI resource is required. You can set it in the AZURE_OPENAI_API_KEY
    // environment variable or replace the default value below.
    // You can find it in the Foundry portal in the Overview page of your Azure OpenAI resource.
    const token = process.env.AZURE_OPENAI_API_KEY || '<Your API Key>';

    // The APIs are compatible with the OpenAI client library.
    // You can use the OpenAI client library to access the Azure OpenAI APIs.
    // Make sure to set the baseURL and apiKey to use the Azure OpenAI endpoint and token.
    const openAIClient = new OpenAI({
        baseURL: baseUrl,
        apiKey: token,
    });

    // Due to the current SDK limitation we need to explicitly
    // pass API key as Header
    const realtimeClient = await OpenAIRealtimeWS.create(
        openAIClient, {
        model: deploymentName,
        options: {
            headers: {
                "api-key": token
            }
        }
    });

    realtimeClient.on('error', (receivedError) => receiveError(receivedError));
    realtimeClient.on('session.created', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('session.updated', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('response.output_audio.delta', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('response.output_audio_transcript.delta', (receivedEvent) => receiveEvent(receivedEvent));
    realtimeClient.on('response.done', (receivedEvent) => receiveEvent(receivedEvent));

    console.log('Waiting for events...');
    while (!isCreated) {
        console.log('Waiting for session.created event...');
        await new Promise((resolve) => setTimeout(resolve, 100));
    }

    // After the session is created, configure it to enable audio input and output.
    const sessionConfig: RealtimeSessionCreateRequest = {
        'type': 'realtime',
        'instructions': 'You are a helpful assistant. You respond by voice and text.',
        'output_modalities': ['audio'],
        'audio': {
            'input': {
                'transcription': {
                    'model': 'whisper-1'
                },
                'format': {
                    'type': 'audio/pcm',
                    'rate': 24000,
                },
                'turn_detection': {
                    'type': 'server_vad',
                    'threshold': 0.5,
                    'prefix_padding_ms': 300,
                    'silence_duration_ms': 200,
                    'create_response': true
                }
            },
            'output': {
                'voice': 'alloy',
                'format': {
                    'type': 'audio/pcm',
                    'rate': 24000,
                }
            }
        }
    };

    realtimeClient.send({
        'type': 'session.update',
        'session': sessionConfig
    });
    while (!isConfigured) {
        console.log('Waiting for session.updated event...');
        await new Promise((resolve) => setTimeout(resolve, 100));
    }

    // After the session is configured, data can be sent to the session.    
    realtimeClient.send({
        'type': 'conversation.item.create',
        'item': {
            'type': 'message',
            'role': 'user',
            'content': [{
                type: 'input_text',
                text: 'Please assist the user.'
            }
            ]
        }
    });

    realtimeClient.send({
        type: 'response.create'
    });

    // While waiting for the session to finish, the events can be handled in the event handlers.
    // In this example, we just wait for the first response.done event.
    while (!responseDone) {
        console.log('Waiting for response.done event...');
        await new Promise((resolve) => setTimeout(resolve, 100));
    }

    console.log('The sample completed successfully.');
    realtimeClient.close();
}

function receiveError(errorEvent: OpenAIRealtimeError): void {
    if (errorEvent instanceof OpenAIRealtimeError) {
        console.error('Received an error event.');
        console.error(`Message: ${errorEvent.message}`);
        console.error(`Stack: ${errorEvent.stack}`);
        errorEvent
    }

    if (throwOnError) {
        throw errorEvent;
    }
}

function receiveEvent(event: any): void {
    console.log(`Received an event: ${event.type}`);

    switch (event.type) {
        case 'session.created':
            console.log(`Session ID: ${event.session.id}`);
            isCreated = true;
            break;
        case 'session.updated':
            console.log(`Session ID: ${event.session.id}`);
            isConfigured = true;
            break;
        case 'response.output_audio_transcript.delta':
            console.log(`Transcript delta: ${event.delta}`);
            break;
        case 'response.output_audio.delta':
            let audioBuffer = Buffer.from(event.delta, 'base64');
            console.log(`Audio delta length: ${audioBuffer.length} bytes`);
            break;
        case 'response.done':
            console.log(`Response ID: ${event.response.id}`);
            console.log(`The final response is: ${event.response.output[0].content[0].transcript}`);
            responseDone = true;
            break;
        default:
            console.warn(`Unhandled event type: ${event.type}`);
    }
}

main().catch((err) => {
    console.error("The sample encountered an error:", err);
});

export {
    main
};

TypeScript コードをトランスパイルするために tsconfig.json ファイルを作成して、ECMAScript 向けの次のコードをコピーします。

{
    "compilerOptions": {
      "module": "NodeNext",
      "target": "ES2022", // Supports top-level await
      "moduleResolution": "NodeNext",
      "skipLibCheck": true, // Avoid type errors from node_modules
      "strict": true // Enable strict type-checking options
    },
    "include": ["*.ts"]
}

Node の型定義をインストールする
```
npm i --save-dev @types/node
```
TypeScript から JavaScript にトランスパイルします。
```
tsc
```
次のコマンドを使用して、コードを実行します。
```
node index.js
```

応答が返されるまで少し時間がかかります。

アウトプット

このスクリプトは、モデルから応答を取得して、受け取ったトランスクリプトとオーディオデータを出力します。

出力は次のようになります。

Waiting for events...
Waiting for session.created event...
Waiting for session.created event...
Waiting for session.created event...
Waiting for session.created event...
Waiting for session.created event...
Waiting for session.created event...
Waiting for session.created event...
Waiting for session.created event...
Waiting for session.created event...
Waiting for session.created event...
Received an event: session.created
Session ID: sess_CWQkREiv3jlU3gk48bm0a
Waiting for session.updated event...
Waiting for session.updated event...
Received an event: session.updated
Session ID: sess_CWQkREiv3jlU3gk48bm0a
Waiting for response.done event...
Waiting for response.done event...
Waiting for response.done event...
Waiting for response.done event...
Waiting for response.done event...
Received an event: response.output_audio_transcript.delta
Transcript delta: Sure
Received an event: response.output_audio_transcript.delta
Transcript delta: ,
Received an event: response.output_audio_transcript.delta
Transcript delta:  I'm
Received an event: response.output_audio_transcript.delta
Transcript delta:  here
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 4800 bytes
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 7200 bytes
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio_transcript.delta
Transcript delta:  to
Received an event: response.output_audio_transcript.delta
Transcript delta:  help
Received an event: response.output_audio_transcript.delta
Transcript delta: .
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio_transcript.delta
Transcript delta:  What
Received an event: response.output_audio_transcript.delta
Transcript delta:  would
Received an event: response.output_audio_transcript.delta
Transcript delta:  you
Received an event: response.output_audio_transcript.delta
Transcript delta:  like
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio_transcript.delta
Transcript delta:  to
Received an event: response.output_audio_transcript.delta
Transcript delta:  do
Received an event: response.output_audio_transcript.delta
Transcript delta:  or
Received an event: response.output_audio_transcript.delta
Transcript delta:  know
Received an event: response.output_audio_transcript.delta
Transcript delta:  about
Received an event: response.output_audio_transcript.delta
Transcript delta: ?
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Waiting for response.done event...
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 12000 bytes
Received an event: response.output_audio.delta
Audio delta length: 24000 bytes
Received an event: response.done
Response ID: resp_CWQkRBrCcCjtHgIEapA92
The final response is: Sure, I'm here to help. What would you like to do or know about?
The sample completed successfully.

リアルタイムオーディオ向けモデルのデプロイ

Microsoft Foundry ポータルで gpt-realtime モデルをデプロイするには:

Foundry ポータルに移動し、プロジェクトを作成または選択します。
モデルデプロイメントを選択してください。
1. Azure OpenAI リソースの場合は、左側のウィンドウの [共有リソース] セクションから [デプロイ] を選択します。
2. Foundry リソースの場合は、左側のウィンドウの [マイアセット] から [モデル + エンドポイント] を選択します。
[+ モデルのデプロイ] を選択>ベースモデルをデプロイしてデプロイウィンドウを開きます。
gpt-realtime モデルを検索して選択し、[確認] を選択します。
デプロイの詳細を確認し、[デプロイ] を選択 します。
ウィザードに従ってモデルのデプロイを完了します。

gpt-realtime モデルのデプロイが済んだので、Foundry ポータルの Audio playground または Realtime API で操作できます。

GPT リアルタイムオーディオを使用する

gpt-realtime リアルタイムオーディオプレイグラウンドでデプロイされたモデルとチャットするには、次の手順に従います。

Foundry ポータルに移動し、gpt-realtime モデルがデプロイされているプロジェクトを選択します。
左側のウィンドウから [プレイグラウンド ] を選択します。
オーディオプレイグラウンド>オーディオプレイグラウンドを試すを選択。

注

チャットプレイグラウンドは、gpt-realtime モデルをサポートしていません。このセクションの説明に従って 、オーディオプレイグラウンド を使用します。
gpt-realtime ドロップダウンから、デプロイしたモデルを選びます。
必要に応じて、[モデルに指示とコンテキストを与える] テキストボックスの内容を編集できます。ふるまいに関する指示と、応答の生成時に参照する必要があるコンテキストをモデルに与えます。アシスタントのパーソナリティを記述したり、答えるべきことと答えるべきでないことを指示したり、応答のフォーマットを指示したりすることができます。
必要に応じて、しきい値、プレフィックスの埋め込み、無音時間などの設定を変更します。
[聞き取りを開始] を選んでセッションを始めます。マイクに向かって話してチャットを開始できます。
話すことで、いつでもチャットを中断できます。 [聞き取りを停止する] ボタンを選ぶと、チャットを終了できます。

フィードバック

このページはお役に立ちましたか?

Last updated on 2025-11-18

次の方法で共有

音声とオーディオ用の GPT Realtime API

サポートされているモデル

API のサポート

[前提条件]

Microsoft Entra ID の前提条件

リアルタイム オーディオ向けモデルのデプロイ

セットアップ

リソース情報の取得

オーディオ出力のテキスト

アウトプット

[前提条件]

Microsoft Entra ID の前提条件

リアルタイム オーディオ向けモデルのデプロイ

セットアップ

リソース情報の取得

オーディオ出力のテキスト

アウトプット

[前提条件]

Microsoft Entra ID の前提条件

リアルタイム オーディオ向けモデルのデプロイ

セットアップ

リソース情報の取得

オーディオ出力のテキスト

アウトプット

リアルタイム オーディオ向けモデルのデプロイ

GPT リアルタイム オーディオを使用する

関連コンテンツ

フィードバック

その他のリソース

リアルタイムオーディオ向けモデルのデプロイ

リアルタイムオーディオ向けモデルのデプロイ

リアルタイムオーディオ向けモデルのデプロイ

リアルタイムオーディオ向けモデルのデプロイ

GPT リアルタイムオーディオを使用する