Azure OpenAI 음성 변환 채팅

아티클
02/23/2024

이 방법 가이드에서는 Azure AI Speech를 사용하여 Azure OpenAI Service와 대화할 수 있습니다. Speech Service에서 인식하는 텍스트는 Azure OpenAI로 전송됩니다. Speech Services는 Azure OpenAI의 텍스트 응답에서 음성을 합성합니다.

마이크를 사용하여 Azure OpenAI와 대화를 시작합니다.

음성 서비스는 사용자의 음성을 인식하여 텍스트로 변환합니다(음성 텍스트 변환).
텍스트 요청이 Azure OpenAI로 전송됩니다.
음성 서비스 텍스트 음성 변환 기능은 Azure OpenAI의 응답을 기본 스피커로 합성합니다.

이 예제의 환경은 앞뒤로 교환되지만 Azure OpenAI는 대화의 컨텍스트를 기억하지 못합니다.

Important

이 가이드의 단계를 완료하려면 Azure 구독에서 Microsoft Azure OpenAI Service에 대한 액세스 권한이 있어야 합니다. 현재 이 서비스에 대한 액세스 권한은 애플리케이션에서만 부여됩니다. https://aka.ms/oai/access에서 양식을 작성하여 Azure OpenAI에 대한 액세스를 신청합니다.

필수 조건

Azure 구독 - 체험 구독 만들기
Azure Portal에서 Microsoft Azure OpenAI Service 리소스를 만듭니다.
Azure OpenAI 리소스에 모델을 배포합니다. 모델 배포에 대한 자세한 내용은 리소스 배포 가이드를 참조하세요.
Azure OpenAI 리소스 키 및 엔드포인트를 가져옵니다. Azure OpenAI 리소스가 배포된 후, 리소스로 이동을 선택하여 키를 보고 관리합니다. Azure AI 서비스 리소스에 대한 자세한 내용은 리소스 키 가져오기를 참조하세요.
Azure Portal에서 음성 리소스 만들기
음성 리소스 키 및 지역을 가져옵니다. 음성 리소스가 배포된 후, 리소스로 이동을 선택하여 키를 보고 관리합니다. Azure AI 서비스 리소스에 대한 자세한 내용은 리소스 키 가져오기를 참조하세요.

환경 설정

음성 SDK는 NuGet 패키지로 사용할 수 있으며 .NET Standard 2.0을 구현합니다. 이 가이드의 뒷부분에서 Speech SDK를 설치하지만, 먼저 SDK 설치 가이드에서 더 많은 요구 사항을 확인합니다.

환경 변수 설정

이 예제에는 OPEN_AI_KEY, OPEN_AI_ENDPOINT, OPEN_AI_DEPLOYMENT_NAME, SPEECH_KEY 및 SPEECH_REGION이라는 환경 변수가 필요합니다.

Azure AI 서비스 리소스에 액세스하려면 애플리케이션을 인증해야 합니다. 프로덕션의 경우 자격 증명을 안전하게 저장하고 액세스하는 방법을 사용합니다. 예를 들어, 음성 리소스에 대한 키를 얻은 후 애플리케이션을 실행하는 로컬 머신의 새 환경 변수에 이 키를 씁니다.

팁

코드에 키를 직접 포함하지 말고 공개적으로 게시하지 마세요. Azure Key Vault와 같은 추가 인증 옵션은 Azure AI 서비스 보안을 참조하세요.

환경 변수를 설정하려면 콘솔 창을 열고 운영 체제 및 개발 환경에 대한 지침을 따릅니다.

OPEN_AI_KEY 환경 변수를 설정하려면 your-openai-key를 리소스에 대한 키 중 하나로 바꿉니다.
OPEN_AI_ENDPOINT 환경 변수를 설정하려면 your-openai-endpoint을(를) 리소스에 대한 지역 중 하나로 바꿉니다.
OPEN_AI_DEPLOYMENT_NAME 환경 변수를 설정하려면 your-openai-deployment-name을(를) 리소스에 대한 지역 중 하나로 바꿉니다.
SPEECH_KEY 환경 변수를 설정하려면 your-speech-key를 리소스에 대한 키 중 하나로 바꿉니다.
SPEECH_REGION 환경 변수를 설정하려면 your-speech-region을(를) 리소스에 대한 지역 중 하나로 바꿉니다.

setx OPEN_AI_KEY your-openai-key
setx OPEN_AI_ENDPOINT your-openai-endpoint
setx OPEN_AI_DEPLOYMENT_NAME your-openai-deployment-name
setx SPEECH_KEY your-speech-key
setx SPEECH_REGION your-speech-region

참고 항목

현재 실행 중인 콘솔에서만 환경 변수에 액세스해야 하는 경우 환경 변수를 setx 대신 set로 설정합니다.

환경 변수를 추가한 후에는 콘솔 창을 포함하여 실행 중인 프로그램 중에서 환경 변수를 읽어야 하는 프로그램을 다시 시작해야 할 수도 있습니다. 예를 들어, Visual Studio가 편집기인 경우 예를 실행하기 전에 Visual Studio를 다시 시작합니다.

export OPEN_AI_KEY=your-openai-key
export OPEN_AI_ENDPOINT=your-openai-endpoint
export OPEN_AI_DEPLOYMENT_NAME=your-openai-deployment-name
export SPEECH_KEY=your-speech-key
export SPEECH_REGION=your-speech-region

환경 변수를 추가한 후 콘솔 창에서 source ~/.bashrc 명령을 실행하여 변경 내용을 적용합니다.

Bash

.bash_profile을 편집하고, 환경 변수를 추가합니다.

export OPEN_AI_KEY=your-openai-key
export OPEN_AI_ENDPOINT=your-openai-endpoint
export OPEN_AI_DEPLOYMENT_NAME=your-openai-deployment-name
export SPEECH_KEY=your-speech-key
export SPEECH_REGION=your-speech-region

환경 변수를 추가한 후 콘솔 창에서 source ~/.bash_profile 명령을 실행하여 변경 내용을 적용합니다.

Xcode

iOS 및 macOS 개발의 경우 Xcode에서 환경 변수를 설정합니다. 예를 들어, 다음 단계에 따라 Xcode 13.4.1에서 환경 변수를 설정합니다.

제품>구성표>구성표 편집을 선택합니다.
실행(디버그 실행) 페이지에서 인수를 선택합니다.
환경 변수 아래에서 더하기(+) 기호를 선택하여 새 환경 변수를 추가합니다.
이름에 대해 SPEECH_KEY를 입력하고, 값에 대해 Speech 리소스 키를 입력합니다.

단계를 반복하여 다른 필수 환경 변수를 설정합니다.

추가 구성 옵션은 Xcode 설명서를 참조하세요.

마이크에서 음성 인식

새 콘솔 애플리케이션을 만들려면 다음 단계를 수행합니다.

새 프로젝트를 원하는 폴더에서 명령 프롬프트 창을 엽니다. 이 명령을 실행하여 .NET CLI를 사용하여 콘솔 애플리케이션을 만듭니다.
```
dotnet new console
```
명령은 프로젝트 디렉터리에 Program.cs 파일을 만듭니다.
.NET CLI를 사용하여 새 프로젝트에 음성 SDK를 설치합니다.
```
dotnet add package Microsoft.CognitiveServices.Speech
```
.NET CLI를 사용하여 새 프로젝트에 Azure OpenAI SDK(시험판)를 설치합니다.
```
dotnet add package Azure.AI.OpenAI --prerelease 
```

Program.cs의 내용을 다음 코드로 바꿉니다.

using System.Text;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using Azure;
using Azure.AI.OpenAI;

// This example requires environment variables named "OPEN_AI_KEY", "OPEN_AI_ENDPOINT" and "OPEN_AI_DEPLOYMENT_NAME"
// Your endpoint should look like the following https://YOUR_OPEN_AI_RESOURCE_NAME.openai.azure.com/
string openAIKey = Environment.GetEnvironmentVariable("OPEN_AI_KEY") ??
                   throw new ArgumentException("Missing OPEN_AI_KEY");
string openAIEndpoint = Environment.GetEnvironmentVariable("OPEN_AI_ENDPOINT") ??
                        throw new ArgumentException("Missing OPEN_AI_ENDPOINT");

// Enter the deployment name you chose when you deployed the model.
string engine = Environment.GetEnvironmentVariable("OPEN_AI_DEPLOYMENT_NAME") ??
                throw new ArgumentException("Missing OPEN_AI_DEPLOYMENT_NAME");

// This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
string speechKey = Environment.GetEnvironmentVariable("SPEECH_KEY") ??
                   throw new ArgumentException("Missing SPEECH_KEY");
string speechRegion = Environment.GetEnvironmentVariable("SPEECH_REGION") ??
                      throw new ArgumentException("Missing SPEECH_REGION");

// Sentence end symbols for splitting the response into sentences.
List<string> sentenceSaperators = new() { ".", "!", "?", ";", "。", "！", "？", "；", "\n" };

try
{
    await ChatWithOpenAI();
}
catch (Exception ex)
{
    Console.WriteLine(ex);
}

// Prompts Azure OpenAI with a request and synthesizes the response.
async Task AskOpenAI(string prompt)
{
    object consoleLock = new();
    var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion);

    // The language of the voice that speaks.
    speechConfig.SpeechSynthesisVoiceName = "en-US-JennyMultilingualNeural";
    var audioOutputConfig = AudioConfig.FromDefaultSpeakerOutput();
    using var speechSynthesizer = new SpeechSynthesizer(speechConfig, audioOutputConfig);
    speechSynthesizer.Synthesizing += (sender, args) =>
    {
        lock (consoleLock)
        {
            Console.ForegroundColor = ConsoleColor.Yellow;
            Console.Write($"[Audio]");
            Console.ResetColor();
        }
    };

    // Ask Azure OpenAI
    OpenAIClient client = new(new Uri(openAIEndpoint), new AzureKeyCredential(openAIKey));
    var completionsOptions = new ChatCompletionsOptions()
    {
        DeploymentName = engine,
        Messages = { new ChatRequestUserMessage(prompt) },
        MaxTokens = 100,
    };
    var responseStream = await client.GetChatCompletionsStreamingAsync(completionsOptions);

    StringBuilder gptBuffer = new();
    await foreach (var completionUpdate in responseStream)
    {
        var message = completionUpdate.ContentUpdate;
        if (string.IsNullOrEmpty(message))
        {
            continue;
        }

        lock (consoleLock)
        {
            Console.ForegroundColor = ConsoleColor.DarkBlue;
            Console.Write($"{message}");
            Console.ResetColor();
        }

        gptBuffer.Append(message);

        if (sentenceSaperators.Any(message.Contains))
        {
            var sentence = gptBuffer.ToString().Trim();
            if (!string.IsNullOrEmpty(sentence))
            {
                await speechSynthesizer.SpeakTextAsync(sentence);
                gptBuffer.Clear();
            }
        }
    }
}

// Continuously listens for speech input to recognize and send as text to Azure OpenAI
async Task ChatWithOpenAI()
{
    // Should be the locale for the speaker's language.
    var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion);
    speechConfig.SpeechRecognitionLanguage = "en-US";

    using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();
    using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);
    var conversationEnded = false;

    while (!conversationEnded)
    {
        Console.WriteLine("Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation.");

        // Get audio from the microphone and then send it to the TTS service.
        var speechRecognitionResult = await speechRecognizer.RecognizeOnceAsync();

        switch (speechRecognitionResult.Reason)
        {
            case ResultReason.RecognizedSpeech:
                if (speechRecognitionResult.Text == "Stop.")
                {
                    Console.WriteLine("Conversation ended.");
                    conversationEnded = true;
                }
                else
                {
                    Console.WriteLine($"Recognized speech: {speechRecognitionResult.Text}");
                    await AskOpenAI(speechRecognitionResult.Text);
                }

                break;
            case ResultReason.NoMatch:
                Console.WriteLine($"No speech could be recognized: ");
                break;
            case ResultReason.Canceled:
                var cancellationDetails = CancellationDetails.FromResult(speechRecognitionResult);
                Console.WriteLine($"Speech Recognition canceled: {cancellationDetails.Reason}");
                if (cancellationDetails.Reason == CancellationReason.Error)
                {
                    Console.WriteLine($"Error details={cancellationDetails.ErrorDetails}");
                }

                break;
        }
    }
}

Azure OpenAI에서 반환되는 토큰 수를 늘리거나 줄이려면 ChatCompletionsOptions 클래스 인스턴스에서 MaxTokens 속성을 변경하세요. 토큰 및 비용 관련 자세한 내용은 Azure OpenAI 토큰 및 Azure OpenAI 가격 책정을 참조하세요.
새 콘솔 애플리케이션을 실행하여 마이크의 음성 인식을 시작합니다.
```
dotnet run
```

Important

설명된 대로 OPEN_AI_KEY, OPEN_AI_ENDPOINT, OPEN_AI_DEPLOYMENT_NAME, SPEECH_KEY 및 SPEECH_REGION환경 변수를 설정했는지 확인합니다. 이 변수를 설정하지 않으면 샘플이 오류 메시지와 함께 실패합니다.

메시지가 표시되면 마이크에 말합니다. 콘솔 출력에는 말하기를 시작하라는 프롬프트, 텍스트로 요청, Azure OpenAI의 응답이 텍스트로 포함됩니다. Azure OpenAI의 응답을 텍스트에서 음성으로 변환한 다음 기본 스피커로 출력해야 합니다.

PS C:\dev\openai\csharp> dotnet run
Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation.
Recognized speech:Make a comma separated list of all continents.
Azure OpenAI response:Africa, Antarctica, Asia, Australia, Europe, North America, South America
Speech synthesized to speaker for text [Africa, Antarctica, Asia, Australia, Europe, North America, South America]
Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation.
Recognized speech: Make a comma separated list of 1 Astronomical observatory for each continent. A list should include each continent name in parentheses.
Azure OpenAI response:Mauna Kea Observatories (North America), La Silla Observatory (South America), Tenerife Observatory (Europe), Siding Spring Observatory (Australia), Beijing Xinglong Observatory (Asia), Naukluft Plateau Observatory (Africa), Rutherford Appleton Laboratory (Antarctica)
Speech synthesized to speaker for text [Mauna Kea Observatories (North America), La Silla Observatory (South America), Tenerife Observatory (Europe), Siding Spring Observatory (Australia), Beijing Xinglong Observatory (Asia), Naukluft Plateau Observatory (Africa), Rutherford Appleton Laboratory (Antarctica)]
Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation.
Conversation ended.
PS C:\dev\openai\csharp>

설명

다음은 몇 가지 추가 고려 사항입니다.

음성 인식 언어를 변경하려면 en-US를 다른 지원되는 언어로 바꿉니다. 예를 들어 스페인어(스페인)의 경우 es-ES입니다. 기본 언어는 en-US입니다. 음성에 사용될 수 있는 여러 언어 중 하나를 식별하는 방법에 대한 자세한 내용은 언어 식별을 참조하세요.
들리는 음성을 변경하려면 en-US-JennyMultilingualNeural을 지원되는 다른 음성으로 바꿉 있습니다. 음성이 Azure OpenAI에서 반환된 텍스트의 언어를 모르는 경우 Speech Service에서 합성된 오디오를 출력하지 않습니다.
다른 모델을 사용하려면 gpt-35-turbo-instruct를 다른 배포의 ID로 바꿉니다. 배포 ID가 모델 이름과 반드시 동일할 필요는 없습니다. Azure OpenAI Studio에서 배포를 만들 때 배포 이름을 지정했습니다.
또한 Azure OpenAI는 프롬프트 입력 및 생성된 출력에서 콘텐츠 조정을 수행합니다. 유해한 콘텐츠가 감지되면 프롬프트나 응답이 필터링될 수 있습니다. 자세한 내용은 콘텐츠 필터링 문서를 참조하세요.

리소스 정리

Azure Portal 또는 Azure CLI(명령줄 인터페이스)를 사용하여 생성된 음성 리소스를 제거할 수 있습니다.

참조 설명서 | 패키지(PyPi) | GitHub의 추가 샘플

마이크를 사용하여 Azure OpenAI와 대화를 시작합니다.

음성 서비스는 사용자의 음성을 인식하여 텍스트로 변환합니다(음성 텍스트 변환).
텍스트 요청이 Azure OpenAI로 전송됩니다.
음성 서비스 텍스트 음성 변환 기능은 Azure OpenAI의 응답을 기본 스피커로 합성합니다.

이 예제의 환경은 앞뒤로 교환되지만 Azure OpenAI는 대화의 컨텍스트를 기억하지 못합니다.

Important

필수 조건

Azure 구독 - 체험 구독 만들기
Azure Portal에서 Microsoft Azure OpenAI Service 리소스를 만듭니다.
Azure OpenAI 리소스에 모델을 배포합니다. 모델 배포에 대한 자세한 내용은 리소스 배포 가이드를 참조하세요.
Azure OpenAI 리소스 키 및 엔드포인트를 가져옵니다. Azure OpenAI 리소스가 배포된 후, 리소스로 이동을 선택하여 키를 보고 관리합니다. Azure AI 서비스 리소스에 대한 자세한 내용은 리소스 키 가져오기를 참조하세요.
Azure Portal에서 음성 리소스 만들기
음성 리소스 키 및 지역을 가져옵니다. 음성 리소스가 배포된 후, 리소스로 이동을 선택하여 키를 보고 관리합니다. Azure AI 서비스 리소스에 대한 자세한 내용은 리소스 키 가져오기를 참조하세요.

환경 설정

Python용 Speech SDK는 PyPI(Python Package Index) 모듈로 사용할 수 있습니다. Python용 Speech SDK는 Windows, Linux 및 macOS와 호환됩니다.

플랫폼에 맞는 Visual Studio 2015, 2017, 2019, 2022용 Microsoft Visual C++ 재배포 가능 패키지를 설치합니다. 이 패키지를 처음 설치하려면 다시 시작해야 할 수 있습니다.
Linux에서는 x64 대상 아키텍처를 사용해야 합니다.

Python 3.7 이상 버전을 설치합니다. 먼저 SDK 설치 가이드에서 더 많은 요구 사항을 확인합니다.

다음 Python 라이브러리를 설치합니다. os, requests, json.

환경 변수 설정

이 예제에는 OPEN_AI_KEY, OPEN_AI_ENDPOINT, OPEN_AI_DEPLOYMENT_NAME, SPEECH_KEY 및 SPEECH_REGION이라는 환경 변수가 필요합니다.

팁

코드에 키를 직접 포함하지 말고 공개적으로 게시하지 마세요. Azure Key Vault와 같은 추가 인증 옵션은 Azure AI 서비스 보안을 참조하세요.

환경 변수를 설정하려면 콘솔 창을 열고 운영 체제 및 개발 환경에 대한 지침을 따릅니다.

OPEN_AI_KEY 환경 변수를 설정하려면 your-openai-key를 리소스에 대한 키 중 하나로 바꿉니다.
OPEN_AI_ENDPOINT 환경 변수를 설정하려면 your-openai-endpoint을(를) 리소스에 대한 지역 중 하나로 바꿉니다.
OPEN_AI_DEPLOYMENT_NAME 환경 변수를 설정하려면 your-openai-deployment-name을(를) 리소스에 대한 지역 중 하나로 바꿉니다.
SPEECH_KEY 환경 변수를 설정하려면 your-speech-key를 리소스에 대한 키 중 하나로 바꿉니다.
SPEECH_REGION 환경 변수를 설정하려면 your-speech-region을(를) 리소스에 대한 지역 중 하나로 바꿉니다.

setx OPEN_AI_KEY your-openai-key
setx OPEN_AI_ENDPOINT your-openai-endpoint
setx OPEN_AI_DEPLOYMENT_NAME your-openai-deployment-name
setx SPEECH_KEY your-speech-key
setx SPEECH_REGION your-speech-region

참고 항목

현재 실행 중인 콘솔에서만 환경 변수에 액세스해야 하는 경우 환경 변수를 setx 대신 set로 설정합니다.

export OPEN_AI_KEY=your-openai-key
export OPEN_AI_ENDPOINT=your-openai-endpoint
export OPEN_AI_DEPLOYMENT_NAME=your-openai-deployment-name
export SPEECH_KEY=your-speech-key
export SPEECH_REGION=your-speech-region

환경 변수를 추가한 후 콘솔 창에서 source ~/.bashrc 명령을 실행하여 변경 내용을 적용합니다.

Bash

.bash_profile을 편집하고, 환경 변수를 추가합니다.

export OPEN_AI_KEY=your-openai-key
export OPEN_AI_ENDPOINT=your-openai-endpoint
export OPEN_AI_DEPLOYMENT_NAME=your-openai-deployment-name
export SPEECH_KEY=your-speech-key
export SPEECH_REGION=your-speech-region

환경 변수를 추가한 후 콘솔 창에서 source ~/.bash_profile 명령을 실행하여 변경 내용을 적용합니다.

Xcode

iOS 및 macOS 개발의 경우 Xcode에서 환경 변수를 설정합니다. 예를 들어, 다음 단계에 따라 Xcode 13.4.1에서 환경 변수를 설정합니다.

제품>구성표>구성표 편집을 선택합니다.
실행(디버그 실행) 페이지에서 인수를 선택합니다.
환경 변수 아래에서 더하기(+) 기호를 선택하여 새 환경 변수를 추가합니다.
이름에 대해 SPEECH_KEY를 입력하고, 값에 대해 Speech 리소스 키를 입력합니다.

단계를 반복하여 다른 필수 환경 변수를 설정합니다.

추가 구성 옵션은 Xcode 설명서를 참조하세요.

마이크에서 음성 인식

새 콘솔 애플리케이션을 만들려면 다음 단계를 수행합니다.

새 프로젝트를 원하는 폴더에서 명령 프롬프트 창을 엽니다. 새 프로젝트를 원하는 명령 프롬프트를 열고 openai-speech.py라는 새 파일을 만듭니다.
다음 명령을 실행하여 Speech SDK를 설치합니다.
```
pip install azure-cognitiveservices-speech
```
다음 명령을 실행하여 OpenAI SDK를 설치합니다.
```
pip install openai
```
참고 항목

이 라이브러리는 OpenAI(Microsoft Azure 아님)에 의해 유지 관리됩니다. 라이브러리에 대한 최신 업데이트를 추적하려면 릴리스 기록 또는 version.py 커밋 기록을 참조하세요.

openai-speech.py라는 파일을 만듭니다. 다음 코드를 해당 파일에 복사합니다.

import os
import azure.cognitiveservices.speech as speechsdk
from openai import AzureOpenAI

# This example requires environment variables named "OPEN_AI_KEY", "OPEN_AI_ENDPOINT" and "OPEN_AI_DEPLOYMENT_NAME"
# Your endpoint should look like the following https://YOUR_OPEN_AI_RESOURCE_NAME.openai.azure.com/
client = AzureOpenAI(
azure_endpoint=os.environ.get('OPEN_AI_ENDPOINT'),
api_key=os.environ.get('OPEN_AI_KEY'),
api_version="2023-05-15"
)

# This will correspond to the custom name you chose for your deployment when you deployed a model.
deployment_id=os.environ.get('OPEN_AI_DEPLOYMENT_NAME')

# This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'), region=os.environ.get('SPEECH_REGION'))
audio_output_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)

# Should be the locale for the speaker's language.
speech_config.speech_recognition_language="en-US"
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

# The language of the voice that responds on behalf of Azure OpenAI.
speech_config.speech_synthesis_voice_name='en-US-JennyMultilingualNeural'
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_output_config)
# tts sentence end mark
tts_sentence_end = [ ".", "!", "?", ";", "。", "！", "？", "；", "\n" ]

# Prompts Azure OpenAI with a request and synthesizes the response.
def ask_openai(prompt):
    # Ask Azure OpenAI in streaming way
    response = client.chat.completions.create(model=deployment_id, max_tokens=200, stream=True, messages=[
        {"role": "user", "content": prompt}
    ])
    collected_messages = []
    last_tts_request = None

    # iterate through the stream response stream
    for chunk in response:
        if len(chunk.choices) > 0:
            chunk_message = chunk.choices[0].delta.content  # extract the message
            if chunk_message is not None:
                collected_messages.append(chunk_message)  # save the message
                if chunk_message in tts_sentence_end: # sentence end found
                    text = ''.join(collected_messages).strip() # join the recieved message together to build a sentence
                    if text != '': # if sentence only have \n or space, we could skip
                        print(f"Speech synthesized to speaker for: {text}")
                        last_tts_request = speech_synthesizer.speak_text_async(text)
                        collected_messages.clear()
    if last_tts_request:
        last_tts_request.get()

# Continuously listens for speech input to recognize and send as text to Azure OpenAI
def chat_with_open_ai():
    while True:
        print("Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation.")
        try:
            # Get audio from the microphone and then send it to the TTS service.
            speech_recognition_result = speech_recognizer.recognize_once_async().get()

            # If speech is recognized, send it to Azure OpenAI and listen for the response.
            if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
                if speech_recognition_result.text == "Stop.": 
                    print("Conversation ended.")
                    break
                print("Recognized speech: {}".format(speech_recognition_result.text))
                ask_openai(speech_recognition_result.text)
            elif speech_recognition_result.reason == speechsdk.ResultReason.NoMatch:
                print("No speech could be recognized: {}".format(speech_recognition_result.no_match_details))
                break
            elif speech_recognition_result.reason == speechsdk.ResultReason.Canceled:
                cancellation_details = speech_recognition_result.cancellation_details
                print("Speech Recognition canceled: {}".format(cancellation_details.reason))
                if cancellation_details.reason == speechsdk.CancellationReason.Error:
                    print("Error details: {}".format(cancellation_details.error_details))
        except EOFError:
            break

# Main

try:
    chat_with_open_ai()
except Exception as err:
    print("Encountered exception. {}".format(err))

Azure OpenAI에서 반환하는 토큰 수를 늘리거나 줄이려면 max_tokens 매개 변수를 변경합니다. 토큰 및 비용 관련 자세한 내용은 Azure OpenAI 토큰 및 Azure OpenAI 가격 책정을 참조하세요.
새 콘솔 애플리케이션을 실행하여 마이크의 음성 인식을 시작합니다.
```
python openai-speech.py
```

Important

앞에서 설명한 대로 OPEN_AI_KEY, OPEN_AI_ENDPOINT, OPEN_AI_DEPLOYMENT_NAME, SPEECH_KEY 및 SPEECH_REGION 환경 변수를 설정해야 합니다. 이 변수를 설정하지 않으면 샘플이 오류 메시지와 함께 실패합니다.

PS C:\dev\openai\python> python.exe .\openai-speech.py
Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation.
Recognized speech:Make a comma separated list of all continents.
Azure OpenAI response:Africa, Antarctica, Asia, Australia, Europe, North America, South America
Speech synthesized to speaker for text [Africa, Antarctica, Asia, Australia, Europe, North America, South America]
Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation.
Recognized speech: Make a comma separated list of 1 Astronomical observatory for each continent. A list should include each continent name in parentheses.
Azure OpenAI response:Mauna Kea Observatories (North America), La Silla Observatory (South America), Tenerife Observatory (Europe), Siding Spring Observatory (Australia), Beijing Xinglong Observatory (Asia), Naukluft Plateau Observatory (Africa), Rutherford Appleton Laboratory (Antarctica)
Speech synthesized to speaker for text [Mauna Kea Observatories (North America), La Silla Observatory (South America), Tenerife Observatory (Europe), Siding Spring Observatory (Australia), Beijing Xinglong Observatory (Asia), Naukluft Plateau Observatory (Africa), Rutherford Appleton Laboratory (Antarctica)]
Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation.
Conversation ended.
PS C:\dev\openai\python>

설명

다음은 몇 가지 추가 고려 사항입니다.

음성 인식 언어를 변경하려면 en-US를 다른 지원되는 언어로 바꿉니다. 예를 들어 스페인어(스페인)의 경우 es-ES입니다. 기본 언어는 en-US입니다. 음성에 사용될 수 있는 여러 언어 중 하나를 식별하는 방법에 대한 자세한 내용은 언어 식별을 참조하세요.
들리는 음성을 변경하려면 en-US-JennyMultilingualNeural을 지원되는 다른 음성으로 바꿉 있습니다. 음성이 Azure OpenAI에서 반환된 텍스트의 언어를 모르는 경우 Speech Service에서 합성된 오디오를 출력하지 않습니다.
다른 모델을 사용하려면 gpt-35-turbo-instruct를 다른 배포의 ID로 바꿉니다. 배포 ID가 모델 이름과 반드시 동일하지는 않습니다. Azure OpenAI Studio에서 배포를 만들 때 배포 이름을 지정했습니다.
또한 Azure OpenAI는 프롬프트 입력 및 생성된 출력에서 콘텐츠 조정을 수행합니다. 유해한 콘텐츠가 감지되면 프롬프트나 응답이 필터링될 수 있습니다. 자세한 내용은 콘텐츠 필터링 문서를 참조하세요.

리소스 정리

Azure Portal 또는 Azure CLI(명령줄 인터페이스)를 사용하여 생성된 음성 리소스를 제거할 수 있습니다.

Share via

Azure OpenAI 음성 변환 채팅

필수 조건

환경 설정

환경 변수 설정

Bash

Xcode

마이크에서 음성 인식

설명

리소스 정리

필수 조건

환경 설정

환경 변수 설정

Bash

Xcode

마이크에서 음성 인식

설명

리소스 정리

추가 리소스

Share via

Azure OpenAI 음성 변환 채팅

필수 조건

환경 설정

환경 변수 설정

마이크에서 음성 인식

설명

리소스 정리

필수 조건

환경 설정

환경 변수 설정

마이크에서 음성 인식

설명

리소스 정리

관련 콘텐츠

추가 리소스