認識アクションを使用してユーザーによる入力を収集する

[アーティクル]
11/14/2023

このガイドは、Azure Communication Services Call Automation SDK を使用して参加者が提供する DTMF 入力の認識を開始するのに役立ちます。

前提条件

アクティブなサブスクリプションを持つ Azure アカウント。詳細については、アカウントの無料作成に関するページを参照してください。
Azure Communication Services リソース。 Azure Communication Services リソースの作成に関する記事を参照してください。このリソースの接続文字列をメモします。
Call Automation SDK を使用して新しい Web サービスアプリケーションを作成する。
お使いのオペレーティングシステムに対応した最新の .NET ライブラリ。
最新の NuGet パッケージを取得します。

AI 機能のために

Azure AI サービスを作成し、Azure Communication Services リソースに接続します。
Azure AI サービスリソースのカスタムサブドメインを作成します。

技術仕様

Recognize 関数をカスタマイズするには、次のパラメーターを使用できます：

パラメーター	Type	（指定されない場合は）既定値	説明	必須または任意
Prompt (Play アクションの詳細については、この攻略ガイドを参照してください)	FileSource、TextSource	設定なし	これは、入力を認識する前に再生するメッセージになります。	オプション
InterToneTimeout	TimeSpan	2 秒最小: 1 秒最大: 60 秒	Azure Communication Services が呼び出し元が別の桁を押すのを待機する秒数を制限します (数字間タイムアウト)。	オプション
InitialSegmentationSilenceTimeoutInSeconds	Integer	0.5 秒	認識アクションがタイムアウトを考慮する前に入力を待機する時間は。詳細をこちらでご確認ください。	オプション
RecognizeInputsType	列挙型	dtmf	認識される入力の種類。オプションは dtmf、choices、speech、speechordtmf です。	必須
InitialSilenceTimeout	TimeSpan	5 秒最小: 0 秒最大: 300 秒 (DTMF) 最大: 20 秒 (Choices) 最大: 20 秒 (Speech)	初期無音タイムアウトでは、認識試行が "一致なし" の結果で終了する前に、フレーズの "前に" 許容される無音声のオーディオの量を調整します。詳細をこちらでご確認ください。	オプション
MaxTonesToCollect	Integer	既定値なし最小:1	開発者が参加者からの入力として期待する桁数。	必須
StopTones	IEnumeration<DtmfTone>	設定なし	数字の参加者は、バッチ DTMF イベントからエスケープするために押すことができます。	オプション
InterruptPrompt	Bool	正しい	参加者が数字を押して playMessage を中断する機能を持っている場合。	オプション
InterruptCallMediaOperation	Bool	正しい	このフラグを設定すると、現在の通話メディア操作が中断されます。たとえば、オーディオが再生されている場合は、その操作が中断され、認識が開始されます。	オプション
OperationContext	String	設定なし	開発者が中間アクションを渡すことができる文字列。開発者が受信したイベントに関するコンテキストを格納するのに役立ちます。	オプション
フレーズ	String	設定なし	ラベルに関連付けられるフレーズの一覧。これらのいずれかが聞こえた場合、認識は成功したと見なされます。	必須
調子	String	設定なし	ユーザーが音声を使わずに番号を押すことにした場合に認識するトーン。	オプション
ラベル	String	設定なし	認識のキー値。	必須
Language	String	En-us	音声の認識に使われる言語。	オプション
EndSilenceTimeout	TimeSpan	0.5 秒	音声として生成される最終的な結果の検出に使われるスピーカーの最後の一時停止。	オプション

Note

recognizeInputsType に dtmf と speech の両方が指定されている場合、認識アクションは最初に受信した入力の種類に対して動作します。つまり、ユーザーが最初にキーパッド番号を押した場合、認識アクションはそれを dtmf イベントと見なし、dtmf トーンの聞き取りを継続します。ユーザーが最初に話した場合、認識アクションはそれを音声認識と見なし、音声入力を聞き取ります。

新しい C# アプリケーションを作成する

オペレーティングシステムのコンソールウィンドウで、次の dotnet コマンドを使用して、新しい Web アプリケーションを作成します。

dotnet new web -n MyApplication

NuGet パッケージのインストール

まだ入手していない場合は、こちらから NuGet パッケージを取得できます。

通話を確立する

ここまでで、通話の開始について理解できていることと思います。通話の発信の詳細を確認する必要がある場合は、こちらのクイックスタートに従ってください。また、ここで提供されているコードスニペットを使用して、呼び出しに応答する方法を理解することもできます。

var callAutomationClient = new CallAutomationClient("<Azure Communication Services connection string>");

var answerCallOptions = new AnswerCallOptions("<Incoming call context once call is connected>", new Uri("<https://sample-callback-uri>"))  
{  
    CallIntelligenceOptions = new CallIntelligenceOptions() { CognitiveServicesEndpoint = new Uri("<Azure Cognitive Services Endpoint>") } 
};  

var answerCallResult = await callAutomationClient.AnswerCallAsync(answerCallOptions);

Recognize アクションを呼び出す

アプリケーションが呼び出しに応答すると、参加者の入力の認識とプロンプトの再生に関する情報を提供できます。

DTMF

var maxTonesToCollect = 3;
String textToPlay = "Welcome to Contoso, please enter 3 DTMF.";
var playSource = new TextSource(textToPlay, "en-US-ElizabethNeural");
var recognizeOptions = new CallMediaRecognizeDtmfOptions(targetParticipant, maxTonesToCollect) {
  InitialSilenceTimeout = TimeSpan.FromSeconds(30),
    Prompt = playSource,
    InterToneTimeout = TimeSpan.FromSeconds(5),
    InterruptPrompt = true,
    StopTones = new DtmfTone[] {
      DtmfTone.Pound
    },
};
var recognizeResult = await callAutomationClient.GetCallConnection(callConnectionId)
  .GetCallMedia()
  .StartRecognizingAsync(recognizeOptions);

音声テキスト変換フローの場合、Call Automation の認識アクションはカスタム音声モデルの使用もサポートします。カスタム音声モデルのような機能が便利なのは、既定の音声テキスト変換モデルでは理解できないような複雑な単語を聞き取る必要があるアプリケーションを構築する場合です。その良い例は、遠隔医療業界向けのアプリケーションを構築する場合であり、仮想エージェントには医療用語を認識する機能が必要です。カスタム音声モデルの作成とデプロイについては、こちらを参照してください。

音声テキスト変換 Choices

var choices = new List < RecognitionChoice > {
  new RecognitionChoice("Confirm", new List < string > {
    "Confirm",
    "First",
    "One"
  }) {
    Tone = DtmfTone.One
  },
  new RecognitionChoice("Cancel", new List < string > {
    "Cancel",
    "Second",
    "Two"
  }) {
    Tone = DtmfTone.Two
  }
};
String textToPlay = "Hello, This is a reminder for your appointment at 2 PM, Say Confirm to confirm your appointment or Cancel to cancel the appointment. Thank you!";

var playSource = new TextSource(textToPlay, "en-US-ElizabethNeural");
var recognizeOptions = new CallMediaRecognizeChoiceOptions(targetParticipant, choices) {
  InterruptPrompt = true,
    InitialSilenceTimeout = TimeSpan.FromSeconds(30),
    Prompt = playSource,
    OperationContext = "AppointmentReminderMenu",
    //Only add the SpeechModelEndpointId if you have a custom speech model you would like to use
    SpeechModelEndpointId = "YourCustomSpeechModelEndpointId"
};
var recognizeResult = await callAutomationClient.GetCallConnection(callConnectionId)
  .GetCallMedia()
  .StartRecognizingAsync(recognizeOptions);

音声テキスト変換

String textToPlay = "Hi, how can I help you today?";
var playSource = new TextSource(textToPlay, "en-US-ElizabethNeural");
var recognizeOptions = new CallMediaRecognizeSpeechOptions(targetParticipant) {
  Prompt = playSource,
    EndSilenceTimeout = TimeSpan.FromMilliseconds(1000),
    OperationContext = "OpenQuestionSpeech",
    //Only add the SpeechModelEndpointId if you have a custom speech model you would like to use
    SpeechModelEndpointId = "YourCustomSpeechModelEndpointId"
};
var recognizeResult = await callAutomationClient.GetCallConnection(callConnectionId)
  .GetCallMedia()
  .StartRecognizingAsync(recognizeOptions);

音声テキスト変換または DTMF

var maxTonesToCollect = 1; 
String textToPlay = "Hi, how can I help you today, you can press 0 to speak to an agent?"; 
var playSource = new TextSource(textToPlay, "en-US-ElizabethNeural"); 
var recognizeOptions = new CallMediaRecognizeSpeechOrDtmfOptions(targetParticipant, maxTonesToCollect) 
{ 
    Prompt = playSource, 
    EndSilenceTimeout = TimeSpan.FromMilliseconds(1000), 
    InitialSilenceTimeout = TimeSpan.FromSeconds(30), 
    InterruptPrompt = true, 
    OperationContext = "OpenQuestionSpeechOrDtmf",
    //Only add the SpeechModelEndpointId if you have a custom speech model you would like to use
    SpeechModelEndpointId = "YourCustomSpeechModelEndpointId" 
}; 
var recognizeResult = await callAutomationClient.GetCallConnection(callConnectionId) 
    .GetCallMedia() 
    .StartRecognizingAsync(recognizeOptions);

Note

パラメーターが設定されていない場合、可能な場合は既定値が適用されます。

イベントの更新の認識の受信

開発者は、呼び出しに登録した Webhook コールバックで RecognizeCompleted イベントと RecognizeFailed イベントをサブスクライブして、前述のいずれかのイベントが発生したときに次のステップを決定するためのビジネスロジックをアプリケーションに作成できます。

RecognizeCompleted イベントを逆シリアル化する方法の例:

if (acsEvent is RecognizeCompleted recognizeCompleted) 
{ 
    switch (recognizeCompleted.RecognizeResult) 
    { 
        case DtmfResult dtmfResult: 
            //Take action for Recognition through DTMF 
            var tones = dtmfResult.Tones; 
            logger.LogInformation("Recognize completed succesfully, tones={tones}", tones); 
            break; 
        case ChoiceResult choiceResult: 
            // Take action for Recognition through Choices 
            var labelDetected = choiceResult.Label; 
            var phraseDetected = choiceResult.RecognizedPhrase; 
            // If choice is detected by phrase, choiceResult.RecognizedPhrase will have the phrase detected, 
            // If choice is detected using dtmf tone, phrase will be null 
            logger.LogInformation("Recognize completed succesfully, labelDetected={labelDetected}, phraseDetected={phraseDetected}", labelDetected, phraseDetected);
            break; 
        case SpeechResult speechResult: 
            // Take action for Recognition through Choices 
            var text = speechResult.Speech; 
            logger.LogInformation("Recognize completed succesfully, text={text}", text); 
            break; 
        default: 
            logger.LogInformation("Recognize completed succesfully, recognizeResult={recognizeResult}", recognizeCompleted.RecognizeResult); 
            break; 
    } 
}

RecognizeCompleted イベントを逆シリアル化する方法の例:

if (acsEvent is RecognizeFailed recognizeFailed) 
{ 
    if (MediaEventReasonCode.RecognizeInitialSilenceTimedOut.Equals(recognizeFailed.ReasonCode)) 
    { 
        // Take action for time out 
        logger.LogInformation("Recognition failed: initial silencev time out"); 
    } 
    else if (MediaEventReasonCode.RecognizeSpeechOptionNotMatched.Equals(recognizeFailed.ReasonCode)) 
    { 
        // Take action for option not matched 
        logger.LogInformation("Recognition failed: speech option not matched"); 
    } 
    else if (MediaEventReasonCode.RecognizeIncorrectToneDetected.Equals(recognizeFailed.ReasonCode)) 
    { 
        // Take action for incorrect tone 
        logger.LogInformation("Recognition failed: incorrect tone detected"); 
    } 
    else 
    { 
        logger.LogInformation("Recognition failed, result={result}, context={context}", recognizeFailed.ResultInformation?.Message, recognizeFailed.OperationContext); 
    } 
}

RecognizeCanceled イベントを逆シリアル化する方法の例:

if (acsEvent is RecognizeCanceled { OperationContext: "AppointmentReminderMenu" })
        {
            logger.LogInformation($"RecognizeCanceled event received for call connection id: {@event.CallConnectionId}");
            //Take action on recognize canceled operation
           await callConnection.HangUpAsync(forEveryone: true);
        }

前提条件

アクティブなサブスクリプションを持つ Azure アカウント。詳細については、アカウントの無料作成に関するページを参照してください。
Azure Communication Services リソース。 Azure Communication Services リソースの作成に関する記事を参照してください
Call Automation SDK を使用して新しい Web サービスアプリケーションを作成する。
Java Development Kit バージョン 8 以降。
Apache Maven。

AI 機能のために

Azure AI サービスを作成し、Azure Communication Services リソースに接続します。
Azure AI サービスリソースのカスタムサブドメインを作成します。

技術仕様

Recognize 関数をカスタマイズするには、次のパラメーターを使用できます：

パラメーター	Type	（指定されない場合は）既定値	説明	必須または任意
Prompt (Play アクションの詳細については、この攻略ガイドを参照してください)	FileSource、TextSource	設定なし	これは、入力を認識する前に再生するメッセージになります。	オプション
InterToneTimeout	TimeSpan	2 秒最小: 1 秒最大: 60 秒	Azure Communication Services が呼び出し元が別の桁を押すのを待機する秒数を制限します (数字間タイムアウト)。	オプション
InitialSegmentationSilenceTimeoutInSeconds	Integer	0.5 秒	認識アクションがタイムアウトを考慮する前に入力を待機する時間は。詳細をこちらでご確認ください。	オプション
RecognizeInputsType	列挙型	dtmf	認識される入力の種類。オプションは dtmf、choices、speech、speechordtmf です。	必須
InitialSilenceTimeout	TimeSpan	5 秒最小: 0 秒最大: 300 秒 (DTMF) 最大: 20 秒 (Choices) 最大: 20 秒 (Speech)	初期無音タイムアウトでは、認識試行が "一致なし" の結果で終了する前に、フレーズの "前に" 許容される無音声のオーディオの量を調整します。詳細をこちらでご確認ください。	オプション
MaxTonesToCollect	Integer	既定値なし最小:1	開発者が参加者からの入力として期待する桁数。	必須
StopTones	IEnumeration<DtmfTone>	設定なし	数字の参加者は、バッチ DTMF イベントからエスケープするために押すことができます。	オプション
InterruptPrompt	Bool	正しい	参加者が数字を押して playMessage を中断する機能を持っている場合。	オプション
InterruptCallMediaOperation	Bool	正しい	このフラグを設定すると、現在の通話メディア操作が中断されます。たとえば、オーディオが再生されている場合は、その操作が中断され、認識が開始されます。	オプション
OperationContext	String	設定なし	開発者が中間アクションを渡すことができる文字列。開発者が受信したイベントに関するコンテキストを格納するのに役立ちます。	オプション
フレーズ	String	設定なし	ラベルに関連付けられるフレーズの一覧。これらのいずれかが聞こえた場合、認識は成功したと見なされます。	必須
調子	String	設定なし	ユーザーが音声を使わずに番号を押すことにした場合に認識するトーン。	オプション
ラベル	String	設定なし	認識のキー値。	必須
Language	String	En-us	音声の認識に使われる言語。	オプション
EndSilenceTimeout	TimeSpan	0.5 秒	音声として生成される最終的な結果の検出に使われるスピーカーの最後の一時停止。	オプション

Note

新しい Java アプリケーションを作成する

ターミナルまたはコマンドウィンドウで、Java アプリケーションを作成するディレクトリに移動します。 mvn コマンドを実行して、maven-archetype-quickstart テンプレートから Java プロジェクトを生成します。

mvn archetype:generate -DgroupId=com.communication.quickstart -DartifactId=communication-quickstart -DarchetypeArtifactId=maven-archetype-quickstart -DarchetypeVersion=1.4 -DinteractiveMode=false

mvn コマンドにより、artifactId 引数と同じ名前のディレクトリが作成されます。このディレクトリの下の src/main/java ディレクトリにはプロジェクトのソースコードが含まれ、src/test/java ディレクトリにはテストソースが含まれます。

'generate' ステップにより、artifactId と同じ名前のディレクトリが作成されたことがわかります。このディレクトリの下の src/main/java ディレクトリにはプロジェクトのソースコードが含まれ、src/test/java ディレクトリにはテストが含まれ、pom.xml ファイルはプロジェクトのプロジェクトオブジェクトモデル (POM) です。

Java 8 以降を使用するように、アプリケーションの POM ファイルを更新します。

<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
</properties>

パッケージ参照を追加する

POM ファイルに、プロジェクトの次の参照を追加します

azure-communication-callautomation

<dependency>
  <groupId>com.azure</groupId>
  <artifactId>azure-communication-callautomation</artifactId>
  <version>1.0.0</version>
</dependency>

通話を確立する

CallIntelligenceOptions callIntelligenceOptions = new CallIntelligenceOptions().setCognitiveServicesEndpoint("https://sample-cognitive-service-resource.cognitiveservices.azure.com/"); 
answerCallOptions = new AnswerCallOptions("<Incoming call context>", "<https://sample-callback-uri>").setCallIntelligenceOptions(callIntelligenceOptions); 
Response < AnswerCallResult > answerCallResult = callAutomationClient
  .answerCallWithResponse(answerCallOptions)
  .block();

Recognize アクションを呼び出す

アプリケーションが呼び出しに応答すると、参加者の入力の認識とプロンプトの再生に関する情報を提供できます。

DTMF

var maxTonesToCollect = 3;
String textToPlay = "Welcome to Contoso, please enter 3 DTMF.";
var playSource = new TextSource() 
    .setText(textToPlay) 
    .setVoiceName("en-US-ElizabethNeural");

var recognizeOptions = new CallMediaRecognizeDtmfOptions(targetParticipant, maxTonesToCollect) 
    .setInitialSilenceTimeout(Duration.ofSeconds(30)) 
    .setPlayPrompt(playSource) 
    .setInterToneTimeout(Duration.ofSeconds(5)) 
    .setInterruptPrompt(true) 
    .setStopTones(Arrays.asList(DtmfTone.POUND));

var recognizeResponse = callAutomationClient.getCallConnectionAsync(callConnectionId) 
    .getCallMediaAsync() 
    .startRecognizingWithResponse(recognizeOptions) 
    .block(); 

log.info("Start recognizing result: " + recognizeResponse.getStatusCode());

音声テキスト変換 Choices

var choices = Arrays.asList(
  new RecognitionChoice()
  .setLabel("Confirm")
  .setPhrases(Arrays.asList("Confirm", "First", "One"))
  .setTone(DtmfTone.ONE),
  new RecognitionChoice()
  .setLabel("Cancel")
  .setPhrases(Arrays.asList("Cancel", "Second", "Two"))
  .setTone(DtmfTone.TWO)
);

String textToPlay = "Hello, This is a reminder for your appointment at 2 PM, Say Confirm to confirm your appointment or Cancel to cancel the appointment. Thank you!";
var playSource = new TextSource()
  .setText(textToPlay)
  .setVoiceName("en-US-ElizabethNeural");
var recognizeOptions = new CallMediaRecognizeChoiceOptions(targetParticipant, choices)
  .setInterruptPrompt(true)
  .setInitialSilenceTimeout(Duration.ofSeconds(30))
  .setPlayPrompt(playSource)
  .setOperationContext("AppointmentReminderMenu")
  //Only add the SpeechRecognitionModelEndpointId if you have a custom speech model you would like to use
  .setSpeechRecognitionModelEndpointId("YourCustomSpeechModelEndpointID"); 
var recognizeResponse = callAutomationClient.getCallConnectionAsync(callConnectionId)
  .getCallMediaAsync()
  .startRecognizingWithResponse(recognizeOptions)
  .block();

音声テキスト変換

String textToPlay = "Hi, how can I help you today?"; 
var playSource = new TextSource() 
    .setText(textToPlay) 
    .setVoiceName("en-US-ElizabethNeural"); 
var recognizeOptions = new CallMediaRecognizeSpeechOptions(targetParticipant, Duration.ofMillis(1000)) 
    .setPlayPrompt(playSource) 
    .setOperationContext("OpenQuestionSpeech")
    //Only add the SpeechRecognitionModelEndpointId if you have a custom speech model you would like to use
    .setSpeechRecognitionModelEndpointId("YourCustomSpeechModelEndpointID");  
var recognizeResponse = callAutomationClient.getCallConnectionAsync(callConnectionId) 
    .getCallMediaAsync() 
    .startRecognizingWithResponse(recognizeOptions) 
    .block();

音声テキスト変換または DTMF

var maxTonesToCollect = 1; 
String textToPlay = "Hi, how can I help you today, you can press 0 to speak to an agent?"; 
var playSource = new TextSource() 
    .setText(textToPlay) 
    .setVoiceName("en-US-ElizabethNeural"); 
var recognizeOptions = new CallMediaRecognizeSpeechOrDtmfOptions(targetParticipant, maxTonesToCollect, Duration.ofMillis(1000)) 
    .setPlayPrompt(playSource) 
    .setInitialSilenceTimeout(Duration.ofSeconds(30)) 
    .setInterruptPrompt(true) 
    .setOperationContext("OpenQuestionSpeechOrDtmf")
    //Only add the SpeechRecognitionModelEndpointId if you have a custom speech model you would like to use
    .setSpeechRecognitionModelEndpointId("YourCustomSpeechModelEndpointID");  
var recognizeResponse = callAutomationClient.getCallConnectionAsync(callConnectionId) 
    .getCallMediaAsync() 
    .startRecognizingWithResponse(recognizeOptions) 
    .block();

Note

パラメーターが設定されていない場合、可能な場合は既定値が適用されます。

イベントの更新の認識の受信

開発者は、登録した Webhook コールバックの RecognizeCompleted と RecognizeFailed イベントをサブスクライブできます。このコールバックは、アプリケーションのビジネスロジックで、イベントのいずれかが発生したときに次の手順を決定するために使用できます。

RecognizeCompleted イベントを逆シリアル化する方法の例:

if (acsEvent instanceof RecognizeCompleted) { 
    RecognizeCompleted event = (RecognizeCompleted) acsEvent; 
    RecognizeResult recognizeResult = event.getRecognizeResult().get(); 
    if (recognizeResult instanceof DtmfResult) { 
        // Take action on collect tones 
        DtmfResult dtmfResult = (DtmfResult) recognizeResult; 
        List<DtmfTone> tones = dtmfResult.getTones(); 
        log.info("Recognition completed, tones=" + tones + ", context=" + event.getOperationContext()); 
    } else if (recognizeResult instanceof ChoiceResult) { 
        ChoiceResult collectChoiceResult = (ChoiceResult) recognizeResult; 
        String labelDetected = collectChoiceResult.getLabel(); 
        String phraseDetected = collectChoiceResult.getRecognizedPhrase(); 
        log.info("Recognition completed, labelDetected=" + labelDetected + ", phraseDetected=" + phraseDetected + ", context=" + event.getOperationContext()); 
    } else if (recognizeResult instanceof SpeechResult) { 
        SpeechResult speechResult = (SpeechResult) recognizeResult; 
        String text = speechResult.getSpeech(); 
        log.info("Recognition completed, text=" + text + ", context=" + event.getOperationContext()); 
    } else { 
        log.info("Recognition completed, result=" + recognizeResult + ", context=" + event.getOperationContext()); 
    } 
}

RecognizeCompleted イベントを逆シリアル化する方法の例:

if (acsEvent instanceof RecognizeFailed) { 
    RecognizeFailed event = (RecognizeFailed) acsEvent; 
    if (ReasonCode.Recognize.INITIAL_SILENCE_TIMEOUT.equals(event.getReasonCode())) { 
        // Take action for time out 
        log.info("Recognition failed: initial silence time out"); 
    } else if (ReasonCode.Recognize.SPEECH_OPTION_NOT_MATCHED.equals(event.getReasonCode())) { 
        // Take action for option not matched 
        log.info("Recognition failed: speech option not matched"); 
    } else if (ReasonCode.Recognize.DMTF_OPTION_MATCHED.equals(event.getReasonCode())) { 
        // Take action for incorrect tone 
        log.info("Recognition failed: incorrect tone detected"); 
    } else { 
        log.info("Recognition failed, result=" + event.getResultInformation().getMessage() + ", context=" + event.getOperationContext()); 
    } 
}

RecognizeCanceled イベントを逆シリアル化する方法の例:

if (acsEvent instanceof RecognizeCanceled) { 
    RecognizeCanceled event = (RecognizeCanceled) acsEvent; 
    log.info("Recognition canceled, context=" + event.getOperationContext()); 
}

前提条件

アクティブなサブスクリプションを持つ Azure アカウント。詳細については、アカウントの無料作成に関するページを参照してください。
Azure Communication Services リソース。 Azure Communication Services リソースの作成に関する記事を参照してください。このリソースの接続文字列をメモします。
Call Automation SDK を使用して新しい Web サービスアプリケーションを作成する。
Node.js がインストールされている場合は、公式 Web サイトからインストールできます。

AI 機能のために

Azure AI サービスを作成し、Azure Communication Services リソースに接続します。
Azure AI サービスリソースのカスタムサブドメインを作成します。

技術仕様

Recognize 関数をカスタマイズするには、次のパラメーターを使用できます：

パラメーター	Type	（指定されない場合は）既定値	説明	必須または任意
Prompt (Play アクションの詳細については、この攻略ガイドを参照してください)	FileSource、TextSource	設定なし	これは、入力を認識する前に再生するメッセージになります。	オプション
InterToneTimeout	TimeSpan	2 秒最小: 1 秒最大: 60 秒	Azure Communication Services が呼び出し元が別の桁を押すのを待機する秒数を制限します (数字間タイムアウト)。	オプション
InitialSegmentationSilenceTimeoutInSeconds	Integer	0.5 秒	認識アクションがタイムアウトを考慮する前に入力を待機する時間は。詳細をこちらでご確認ください。	オプション
RecognizeInputsType	列挙型	dtmf	認識される入力の種類。オプションは dtmf、choices、speech、speechordtmf です。	必須
InitialSilenceTimeout	TimeSpan	5 秒最小: 0 秒最大: 300 秒 (DTMF) 最大: 20 秒 (Choices) 最大: 20 秒 (Speech)	初期無音タイムアウトでは、認識試行が "一致なし" の結果で終了する前に、フレーズの "前に" 許容される無音声のオーディオの量を調整します。詳細をこちらでご確認ください。	オプション
MaxTonesToCollect	Integer	既定値なし最小:1	開発者が参加者からの入力として期待する桁数。	必須
StopTones	IEnumeration<DtmfTone>	設定なし	数字の参加者は、バッチ DTMF イベントからエスケープするために押すことができます。	オプション
InterruptPrompt	Bool	正しい	参加者が数字を押して playMessage を中断する機能を持っている場合。	オプション
InterruptCallMediaOperation	Bool	正しい	このフラグを設定すると、現在の通話メディア操作が中断されます。たとえば、オーディオが再生されている場合は、その操作が中断され、認識が開始されます。	オプション
OperationContext	String	設定なし	開発者が中間アクションを渡すことができる文字列。開発者が受信したイベントに関するコンテキストを格納するのに役立ちます。	オプション
フレーズ	String	設定なし	ラベルに関連付けられるフレーズの一覧。これらのいずれかが聞こえた場合、認識は成功したと見なされます。	必須
調子	String	設定なし	ユーザーが音声を使わずに番号を押すことにした場合に認識するトーン。	オプション
ラベル	String	設定なし	認識のキー値。	必須
Language	String	En-us	音声の認識に使われる言語。	オプション
EndSilenceTimeout	TimeSpan	0.5 秒	音声として生成される最終的な結果の検出に使われるスピーカーの最後の一時停止。	オプション

Note

新しい JavaScript アプリケーションを作成する

プロジェクトディレクトリに新しい JavaScript アプリケーションを作成します。次のコマンドを使って、新しい Node.js プロジェクトを初期化します。これにより、プロジェクトの依存関係を管理するために使われる、プロジェクトの package.json ファイルが作成されます。

npm init -y

Azure Communication Services Call Automation パッケージをインストールする

npm install @azure/communication-call-automation

プロジェクトディレクトリに新しい JavaScript ファイルを作成します。たとえば、app.js という名前を付けます。このファイルに JavaScript コードを記述します。次のコマンドを使い、Node.js を使ってアプリケーションを実行します。これにより、作成した JavaScript コードが実行されます。

node app.js

通話を確立する

ここまでで、通話の開始について理解できていることと思います。通話の発信の詳細を確認する必要がある場合は、こちらのクイックスタートに従ってください。このクイックスタートでは、発信を作成します。

Recognize アクションを呼び出す

アプリケーションが呼び出しに応答すると、参加者の入力の認識とプロンプトの再生に関する情報を提供できます。

DTMF

const maxTonesToCollect = 3; 
const textToPlay = "Welcome to Contoso, please enter 3 DTMF."; 
const playSource: TextSource = { text: textToPlay, voiceName: "en-US-ElizabethNeural", kind: "textSource" }; 
const recognizeOptions: CallMediaRecognizeDtmfOptions = { 
    maxTonesToCollect: maxTonesToCollect, 
    initialSilenceTimeoutInSeconds: 30, 
    playPrompt: playSource, 
    interToneTimeoutInSeconds: 5, 
    interruptPrompt: true, 
    stopDtmfTones: [ DtmfTone.Pound ], 
    kind: "callMediaRecognizeDtmfOptions" 
}; 

await callAutomationClient.getCallConnection(callConnectionId) 
    .getCallMedia() 
    .startRecognizing(targetParticipant, recognizeOptions);

音声テキスト変換 Choices

const choices = [ 
    {  
        label: "Confirm", 
        phrases: [ "Confirm", "First", "One" ], 
        tone: DtmfTone.One 
    }, 
    { 
        label: "Cancel", 
        phrases: [ "Cancel", "Second", "Two" ], 
        tone: DtmfTone.Two 
    } 
]; 

const textToPlay = "Hello, This is a reminder for your appointment at 2 PM, Say Confirm to confirm your appointment or Cancel to cancel the appointment. Thank you!"; 
const playSource: TextSource = { text: textToPlay, voiceName: "en-US-ElizabethNeural", kind: "textSource" }; 
const recognizeOptions: CallMediaRecognizeChoiceOptions = { 
    choices: choices, 
    interruptPrompt: true, 
    initialSilenceTimeoutInSeconds: 30, 
    playPrompt: playSource, 
    operationContext: "AppointmentReminderMenu", 
    kind: "callMediaRecognizeChoiceOptions",
    //Only add the speechRecognitionModelEndpointId if you have a custom speech model you would like to use
    speechRecognitionModelEndpointId: "YourCustomSpeechEndpointId"
}; 

await callAutomationClient.getCallConnection(callConnectionId) 
    .getCallMedia() 
    .startRecognizing(targetParticipant, recognizeOptions);

音声テキスト変換

const textToPlay = "Hi, how can I help you today?"; 
const playSource: TextSource = { text: textToPlay, voiceName: "en-US-ElizabethNeural", kind: "textSource" }; 
const recognizeOptions: CallMediaRecognizeSpeechOptions = { 
    endSilenceTimeoutInSeconds: 1, 
    playPrompt: playSource, 
    operationContext: "OpenQuestionSpeech", 
    kind: "callMediaRecognizeSpeechOptions",
    //Only add the speechRecognitionModelEndpointId if you have a custom speech model you would like to use
    speechRecognitionModelEndpointId: "YourCustomSpeechEndpointId"
}; 

await callAutomationClient.getCallConnection(callConnectionId) 
    .getCallMedia() 
    .startRecognizing(targetParticipant, recognizeOptions);

音声テキスト変換または DTMF

const maxTonesToCollect = 1; 
const textToPlay = "Hi, how can I help you today, you can press 0 to speak to an agent?"; 
const playSource: TextSource = { text: textToPlay, voiceName: "en-US-ElizabethNeural", kind: "textSource" }; 
const recognizeOptions: CallMediaRecognizeSpeechOrDtmfOptions = { 
    maxTonesToCollect: maxTonesToCollect, 
    endSilenceTimeoutInSeconds: 1, 
    playPrompt: playSource, 
    initialSilenceTimeoutInSeconds: 30, 
    interruptPrompt: true, 
    operationContext: "OpenQuestionSpeechOrDtmf", 
    kind: "callMediaRecognizeSpeechOrDtmfOptions",
    //Only add the speechRecognitionModelEndpointId if you have a custom speech model you would like to use
    speechRecognitionModelEndpointId: "YourCustomSpeechEndpointId"
}; 

await callAutomationClient.getCallConnection(callConnectionId) 
    .getCallMedia() 
    .startRecognizing(targetParticipant, recognizeOptions);

Note

パラメーターが設定されていない場合、可能な場合は既定値が適用されます。

イベントの更新の認識の受信

RecognizeCompleted イベントを逆シリアル化する方法の例:

if (event.type === "Microsoft.Communication.RecognizeCompleted") { 
    if (eventData.recognitionType === "dtmf") { 
        const tones = eventData.dtmfResult.tones; 
        console.log("Recognition completed, tones=%s, context=%s", tones, eventData.operationContext); 
    } else if (eventData.recognitionType === "choices") { 
        const labelDetected = eventData.choiceResult.label; 
        const phraseDetected = eventData.choiceResult.recognizedPhrase; 
        console.log("Recognition completed, labelDetected=%s, phraseDetected=%s, context=%s", labelDetected, phraseDetected, eventData.operationContext); 
    } else if (eventData.recognitionType === "speech") { 
        const text = eventData.speechResult.speech; 
        console.log("Recognition completed, text=%s, context=%s", text, eventData.operationContext); 
    } else { 
        console.log("Recognition completed: data=%s", JSON.stringify(eventData, null, 2)); 
    } 
}

RecognizeCompleted イベントを逆シリアル化する方法の例:

if (event.type === "Microsoft.Communication.RecognizeFailed") {
    console.log("Recognize failed: data=%s", JSON.stringify(eventData, null, 2));
}

RecognizeCanceled イベントを逆シリアル化する方法の例:

if (event.type === "Microsoft.Communication.RecognizeCanceled") {
    console.log("Recognize canceled, context=%s", eventData.operationContext);
}

前提条件

アクティブなサブスクリプションを持つ Azure アカウント。詳細については、アカウントの無料作成に関するページを参照してください。
Azure Communication Services リソース。 Azure Communication Services リソースの作成に関する記事を参照してください。このリソースの接続文字列をメモします。
Call Automation SDK を使用して新しい Web サービスアプリケーションを作成する。
Python がインストールされている場合は、公式サイトからインストールできます。

AI 機能のために

Azure AI サービスを作成し、Azure Communication Services リソースに接続します。
Azure AI サービスリソースのカスタムサブドメインを作成します。

技術仕様

Recognize 関数をカスタマイズするには、次のパラメーターを使用できます：

パラメーター	Type	（指定されない場合は）既定値	説明	必須または任意
Prompt (Play アクションの詳細については、この攻略ガイドを参照してください)	FileSource、TextSource	設定なし	これは、入力を認識する前に再生するメッセージになります。	オプション
InterToneTimeout	TimeSpan	2 秒最小: 1 秒最大: 60 秒	Azure Communication Services が呼び出し元が別の桁を押すのを待機する秒数を制限します (数字間タイムアウト)。	オプション
InitialSegmentationSilenceTimeoutInSeconds	Integer	0.5 秒	認識アクションがタイムアウトを考慮する前に入力を待機する時間は。詳細をこちらでご確認ください。	オプション
RecognizeInputsType	列挙型	dtmf	認識される入力の種類。オプションは dtmf、choices、speech、speechordtmf です。	必須
InitialSilenceTimeout	TimeSpan	5 秒最小: 0 秒最大: 300 秒 (DTMF) 最大: 20 秒 (Choices) 最大: 20 秒 (Speech)	初期無音タイムアウトでは、認識試行が "一致なし" の結果で終了する前に、フレーズの "前に" 許容される無音声のオーディオの量を調整します。詳細をこちらでご確認ください。	オプション
MaxTonesToCollect	Integer	既定値なし最小:1	開発者が参加者からの入力として期待する桁数。	必須
StopTones	IEnumeration<DtmfTone>	設定なし	数字の参加者は、バッチ DTMF イベントからエスケープするために押すことができます。	オプション
InterruptPrompt	Bool	正しい	参加者が数字を押して playMessage を中断する機能を持っている場合。	オプション
InterruptCallMediaOperation	Bool	正しい	このフラグを設定すると、現在の通話メディア操作が中断されます。たとえば、オーディオが再生されている場合は、その操作が中断され、認識が開始されます。	オプション
OperationContext	String	設定なし	開発者が中間アクションを渡すことができる文字列。開発者が受信したイベントに関するコンテキストを格納するのに役立ちます。	オプション
フレーズ	String	設定なし	ラベルに関連付けられるフレーズの一覧。これらのいずれかが聞こえた場合、認識は成功したと見なされます。	必須
調子	String	設定なし	ユーザーが音声を使わずに番号を押すことにした場合に認識するトーン。	オプション
ラベル	String	設定なし	認識のキー値。	必須
Language	String	En-us	音声の認識に使われる言語。	オプション
EndSilenceTimeout	TimeSpan	0.5 秒	音声として生成される最終的な結果の検出に使われるスピーカーの最後の一時停止。	オプション

Note

新しい Python アプリケーションを作成する

プロジェクトの Python 仮想環境を設定する

python -m venv play-audio-app

仮想環境をアクティブ化する

Windows では、次のコマンドを使います。

.\ play-audio-quickstart \Scripts\activate

Unix では、次のコマンドを使います。

source play-audio-quickstart /bin/activate

Azure Communication Services Call Automation パッケージをインストールする

pip install azure-communication-callautomation

プロジェクトディレクトリにアプリケーションファイルを作成し、たとえば app.py という名前を付けます。このファイルに Python コードを記述します。

次のコマンドを使い、Python を使ってアプリケーションを実行します。これにより、作成した Python コードが実行されます。

python app.py

通話を確立する

Recognize アクションを呼び出す

アプリケーションが呼び出しに応答すると、参加者の入力の認識とプロンプトの再生に関する情報を提供できます。

DTMF

max_tones_to_collect = 3 
text_to_play = "Welcome to Contoso, please enter 3 DTMF." 
play_source = TextSource(text=text_to_play, voice_name="en-US-ElizabethNeural") 
call_automation_client.get_call_connection(call_connection_id).start_recognizing_media( 
    dtmf_max_tones_to_collect=max_tones_to_collect, 
    input_type=RecognizeInputType.DTMF, 
    target_participant=target_participant, 
    initial_silence_timeout=30, 
    play_prompt=play_source, 
    dtmf_inter_tone_timeout=5, 
    interrupt_prompt=True, 
    dtmf_stop_tones=[ DtmfTone.Pound ])

音声テキスト変換 Choices

choices = [ 
    RecognitionChoice( 
        label="Confirm", 
        phrases=[ "Confirm", "First", "One" ], 
        tone=DtmfTone.ONE 
    ), 
    RecognitionChoice( 
        label="Cancel", 
        phrases=[ "Cancel", "Second", "Two" ], 
        tone=DtmfTone.TWO 
    ) 
] 
text_to_play = "Hello, This is a reminder for your appointment at 2 PM, Say Confirm to confirm your appointment or Cancel to cancel the appointment. Thank you!" 
play_source = TextSource(text=text_to_play, voice_name="en-US-ElizabethNeural") 
call_automation_client.get_call_connection(call_connection_id).start_recognizing_media( 
    input_type=RecognizeInputType.CHOICES, 
    target_participant=target_participant, 
    choices=choices, 
    interrupt_prompt=True, 
    initial_silence_timeout=30, 
    play_prompt=play_source, 
    operation_context="AppointmentReminderMenu",
    # Only add the speech_recognition_model_endpoint_id if you have a custom speech model you would like to use
    speech_recognition_model_endpoint_id="YourCustomSpeechModelEndpointId")

音声テキスト変換

text_to_play = "Hi, how can I help you today?" 
play_source = TextSource(text=text_to_play, voice_name="en-US-ElizabethNeural") 
call_automation_client.get_call_connection(call_connection_id).start_recognizing_media( 
    input_type=RecognizeInputType.SPEECH, 
    target_participant=target_participant, 
    end_silence_timeout=1, 
    play_prompt=play_source, 
    operation_context="OpenQuestionSpeech",
    # Only add the speech_recognition_model_endpoint_id if you have a custom speech model you would like to use
    speech_recognition_model_endpoint_id="YourCustomSpeechModelEndpointId")

音声テキスト変換または DTMF

max_tones_to_collect = 1 
text_to_play = "Hi, how can I help you today, you can also press 0 to speak to an agent." 
play_source = TextSource(text=text_to_play, voice_name="en-US-ElizabethNeural") 
call_automation_client.get_call_connection(call_connection_id).start_recognizing_media( 
    dtmf_max_tones_to_collect=max_tones_to_collect, 
    input_type=RecognizeInputType.SPEECH_OR_DTMF, 
    target_participant=target_participant, 
    end_silence_timeout=1, 
    play_prompt=play_source, 
    initial_silence_timeout=30, 
    interrupt_prompt=True, 
    operation_context="OpenQuestionSpeechOrDtmf",
    # Only add the speech_recognition_model_endpoint_id if you have a custom speech model you would like to use
    speech_recognition_model_endpoint_id="YourCustomSpeechModelEndpointId")  
app.logger.info("Start recognizing")

Note

パラメーターが設定されていない場合、可能な場合は既定値が適用されます。

イベントの更新の認識の受信

RecognizeCompleted イベントを逆シリアル化する方法の例:

if event.type == "Microsoft.Communication.RecognizeCompleted": 
    app.logger.info("Recognize completed: data=%s", event.data) 
    if event.data['recognitionType'] == "dtmf": 
        tones = event.data['dtmfResult']['tones'] 
        app.logger.info("Recognition completed, tones=%s, context=%s", tones, event.data.get('operationContext')) 
    elif event.data['recognitionType'] == "choices": 
        labelDetected = event.data['choiceResult']['label']; 
        phraseDetected = event.data['choiceResult']['recognizedPhrase']; 
        app.logger.info("Recognition completed, labelDetected=%s, phraseDetected=%s, context=%s", labelDetected, phraseDetected, event.data.get('operationContext')); 
    elif event.data['recognitionType'] == "speech": 
        text = event.data['speechResult']['speech']; 
        app.logger.info("Recognition completed, text=%s, context=%s", text, event.data.get('operationContext')); 
    else: 
        app.logger.info("Recognition completed: data=%s", event.data);

RecognizeCompleted イベントを逆シリアル化する方法の例:

if event.type == "Microsoft.Communication.RecognizeFailed": 
    app.logger.info("Recognize failed: data=%s", event.data);

RecognizeCanceled イベントを逆シリアル化する方法の例:

if event.type == "Microsoft.Communication.RecognizeCanceled":
    # Handle the RecognizeCanceled event according to your application logic

イベントコード

Status	コード	サブコード	メッセージ
RecognizeCompleted	200	8531	アクションが完了し、受信した最大桁数。
RecognizeCompleted	200	8514	停止トーンが検出された時点でアクションが完了しました。
RecognizeCompleted	400	8508	アクションが失敗し、操作が取り消されました。
RecognizeCompleted	400	8532	アクションが失敗し、桁間無音タイムアウトに達しました。
RecognizeCanceled	400	8508	アクションが失敗し、操作が取り消されました。
RecognizeFailed	400	8510	アクションが失敗し、初期無音タイムアウトに達しました。
RecognizeFailed	500	8511	アクションが失敗し、プロンプトの再生中にエラーが発生しました。
RecognizeFailed	500	8512	不明な内部サーバーエラー。
RecognizeFailed	400	8510	アクションが失敗し、初期無音タイムアウトに達しました
RecognizeFailed	400	8532	アクションが失敗し、桁間無音タイムアウトに達しました。
RecognizeFailed	400	8565	アクションが失敗しました。Azure AI サービスに対する不適切な要求が発生しました。入力パラメーターを確認してください。
認識に失敗しました	400	8565	アクションが失敗しました。Azure AI サービスに対する不適切な要求が発生しました。指定されたペイロードを処理できません。再生ソースの入力を確認してください
RecognizeFailed	401	8565	アクションが失敗しました。Azure AI サービスの認証エラー。
RecognizeFailed	403	8565	アクションが失敗しました。Azure AI サービスへの要求が禁止され、要求で使用される無料サブスクリプションのクォータが不足しました。
RecognizeFailed	429	8565	アクションが失敗しました。要求が Azure AI サービスサブスクリプションで許可されている同時要求の数を超えました。
RecognizeFailed	408	8565	アクションが失敗しました。Azure AI サービスへの要求がタイムアウトしました。
RecognizeFailed	500	8511	アクションが失敗し、プロンプトの再生中にエラーが発生しました。
RecognizeFailed	500	8512	不明な内部サーバーエラー。

既知の制限事項

帯域内 DTMF はサポートされていないため、代わりに RFC 2833 DTMF を使用してください。
テキスト読み上げのテキストプロンプトは最大 400 文字をサポートしています。プロンプトがこれより長い場合は、テキスト読み上げベースのプレイアクションに SSML を使用することをお勧めします。
Speech サービスのクォータ制限を超過したシナリオの場合、こちらに記載されている手順でこの制限の引き上げを要求できます。

リソースをクリーンアップする

Communication Services サブスクリプションをクリーンアップして解除する場合は、リソースまたはリソースグループを削除できます。リソースグループを削除すると、それに関連付けられている他のリソースも削除されます。詳細については、リソースのクリーンアップに関する記事を参照してください。

次のステップ

ユーザー入力の収集に関する詳細情報
通話でのオーディオの再生に関する詳細情報
Call Automation の詳細を確認する

次の方法で共有

認識アクションを使用してユーザーによる入力を収集する

前提条件

AI 機能のために

技術仕様

新しい C# アプリケーションを作成する

NuGet パッケージのインストール

通話を確立する

Recognize アクション を呼び出す

DTMF

音声テキスト変換 Choices

音声テキスト変換

音声テキスト変換または DTMF

イベントの更新の認識の受信

RecognizeCompleted イベントを逆シリアル化する方法の例:

RecognizeCompleted イベントを逆シリアル化する方法の例:

RecognizeCanceled イベントを逆シリアル化する方法の例:

前提条件

AI 機能のために

技術仕様

新しい Java アプリケーションを作成する

パッケージ参照を追加する

通話を確立する

Recognize アクション を呼び出す

DTMF

音声テキスト変換 Choices

音声テキスト変換

音声テキスト変換または DTMF

イベントの更新の認識の受信

RecognizeCompleted イベントを逆シリアル化する方法の例:

RecognizeCompleted イベントを逆シリアル化する方法の例:

RecognizeCanceled イベントを逆シリアル化する方法の例:

前提条件

AI 機能のために

技術仕様

新しい JavaScript アプリケーションを作成する

Azure Communication Services Call Automation パッケージをインストールする

通話を確立する

Recognize アクション を呼び出す

DTMF

音声テキスト変換 Choices

音声テキスト変換

音声テキスト変換または DTMF

イベントの更新の認識の受信

RecognizeCompleted イベントを逆シリアル化する方法の例:

RecognizeCompleted イベントを逆シリアル化する方法の例:

RecognizeCanceled イベントを逆シリアル化する方法の例:

前提条件

AI 機能のために

技術仕様

新しい Python アプリケーションを作成する

プロジェクトの Python 仮想環境を設定する

仮想環境をアクティブ化する

Azure Communication Services Call Automation パッケージをインストールする

通話を確立する

Recognize アクション を呼び出す

DTMF

音声テキスト変換 Choices

音声テキスト変換

音声テキスト変換または DTMF

イベントの更新の認識の受信

RecognizeCompleted イベントを逆シリアル化する方法の例:

RecognizeCompleted イベントを逆シリアル化する方法の例:

RecognizeCanceled イベントを逆シリアル化する方法の例:

イベント コード

既知の制限事項

リソースをクリーンアップする

次のステップ

フィードバック

フィードバック

その他のリソース

Recognize アクションを呼び出す

Recognize アクションを呼び出す

Recognize アクションを呼び出す

Recognize アクションを呼び出す

イベントコード