Speech to speech chat

pakazs 20 Reputation points
2023-07-26T19:51:31.5666667+00:00

Hello all, I am interested in leveraging azure open ai to do speech to speech chat. I have a few questions. Is my text in the middle safe? Can I do it in the studio? What’s the process?

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,638 questions
0 comments No comments
{count} votes

Accepted answer
  1. YutongTie-MSFT 53,871 Reputation points
    2023-07-26T23:04:06.1733333+00:00

    Hello @pakazs

    Thanks for reaching out to us, yes, you can do it by SDK - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/openai-speech?tabs=linux&pivots=programming-language-csharp

    You can use Azure AI services Speech to converse with Azure OpenAI Service. The text recognized by the Speech service is sent to Azure OpenAI. The text response from Azure OpenAI is then synthesized by the Speech service.

    Speak into the microphone to start a conversation with Azure OpenAI.

    • The Speech service recognizes your speech and converts it into text (speech to text).
    • Your request as text is sent to Azure OpenAI.
    • The Speech service text to speech (TTS) feature synthesizes the response from Azure OpenAI to the default speaker.

    Although the experience of this example is a back-and-forth exchange, Azure OpenAI doesn't remember the context of your conversation.

    A quick sample by C# is here -

    using System;
    using System.IO;
    using System.Threading.Tasks;
    using Microsoft.CognitiveServices.Speech;
    using Microsoft.CognitiveServices.Speech.Audio;
    using Azure;
    using Azure.AI.OpenAI;
    using static System.Environment;
    
    class Program 
    {
        // This example requires environment variables named "OPEN_AI_KEY" and "OPEN_AI_ENDPOINT"
        // Your endpoint should look like the following https://YOUR_OPEN_AI_RESOURCE_NAME.openai.azure.com/
        static string openAIKey = Environment.GetEnvironmentVariable("OPEN_AI_KEY");
        static string openAIEndpoint = Environment.GetEnvironmentVariable("OPEN_AI_ENDPOINT");
    
        // Enter the deployment name you chose when you deployed the model.
        static string engine = "text-davinci-003";
    
        // This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
        static string speechKey = Environment.GetEnvironmentVariable("SPEECH_KEY");
        static string speechRegion = Environment.GetEnvironmentVariable("SPEECH_REGION");
    
        // Prompts Azure OpenAI with a request and synthesizes the response.
        async static Task AskOpenAI(string prompt)
        {
            // Ask Azure OpenAI
            OpenAIClient client = new(new Uri(openAIEndpoint), new AzureKeyCredential(openAIKey));
            var completionsOptions = new CompletionsOptions()
            {
                Prompts = { prompt },
                MaxTokens = 100,
            };
            Response<Completions> completionsResponse = client.GetCompletions(engine, completionsOptions);
            string text = completionsResponse.Value.Choices[0].Text.Trim();
            Console.WriteLine($"Azure OpenAI response: {text}");
    
            var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion);
            // The language of the voice that speaks.
            speechConfig.SpeechSynthesisVoiceName = "en-US-JennyMultilingualNeural"; 
            var audioOutputConfig = AudioConfig.FromDefaultSpeakerOutput();
    
            using (var speechSynthesizer = new SpeechSynthesizer(speechConfig, audioOutputConfig))
            {
                var speechSynthesisResult = await speechSynthesizer.SpeakTextAsync(text).ConfigureAwait(true);
    
                if (speechSynthesisResult.Reason == ResultReason.SynthesizingAudioCompleted)
                {
                    Console.WriteLine($"Speech synthesized to speaker for text: [{text}]");
                }
                else if (speechSynthesisResult.Reason == ResultReason.Canceled)
                {
                    var cancellationDetails = SpeechSynthesisCancellationDetails.FromResult(speechSynthesisResult);
                    Console.WriteLine($"Speech synthesis canceled: {cancellationDetails.Reason}");
    
                    if (cancellationDetails.Reason == CancellationReason.Error)
                    {
                        Console.WriteLine($"Error details: {cancellationDetails.ErrorDetails}");
                    }
                }
            }
        }
    
        // Continuously listens for speech input to recognize and send as text to Azure OpenAI
        async static Task ChatWithOpenAI()
        {
            // Should be the locale for the speaker's language.
            var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion);        
            speechConfig.SpeechRecognitionLanguage = "en-US";
    
            using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();
            using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);
            var conversationEnded = false;
    
            while(!conversationEnded)
            {
                Console.WriteLine("Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation.");
    
                // Get audio from the microphone and then send it to the TTS service.
                var speechRecognitionResult = await speechRecognizer.RecognizeOnceAsync();           
    
                switch (speechRecognitionResult.Reason)
                {
                    case ResultReason.RecognizedSpeech:
                        if (speechRecognitionResult.Text == "Stop.")
                        {
                            Console.WriteLine("Conversation ended.");
                            conversationEnded = true;
                        }
                        else
                        {
                            Console.WriteLine($"Recognized speech: {speechRecognitionResult.Text}");
                            await AskOpenAI(speechRecognitionResult.Text).ConfigureAwait(true);
                        }
                        break;
                    case ResultReason.NoMatch:
                        Console.WriteLine($"No speech could be recognized: ");
                        break;
                    case ResultReason.Canceled:
                        var cancellationDetails = CancellationDetails.FromResult(speechRecognitionResult);
                        Console.WriteLine($"Speech Recognition canceled: {cancellationDetails.Reason}");
                        if (cancellationDetails.Reason == CancellationReason.Error)
                        {
                            Console.WriteLine($"Error details={cancellationDetails.ErrorDetails}");
                        }
                        break;
                }
            }
        }
    
        async static Task Main(string[] args)
        {
            try
            {
                await ChatWithOpenAI().ConfigureAwait(true);
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }
        }
    }
    
    
    

    I hope this helps, please let me know if you have more questions.

    Regards,

    Yutong

    -Please kindly accept the answer and vote 'Yes' if you feel helpful to support the community, thanks a lot.

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Dillon Silzer 57,631 Reputation points
    2023-07-26T22:11:10.3766667+00:00

    Hello pakazs,

    You can use Azure OpenAI REST APis to send requests:

    https://learn.microsoft.com/en-us/azure/ai-services/openai/reference

    The REST API is secured by an API key used in the header of your request or you can use Azure AD authentication. Either way, as long as your application from your side is secured then your request to and from OpenAI will be secured as well.

    All requests sent to REST API are through HTTPs (SSL protected) and the encryption should be secure (unless your side has been compromised).


    If this is helpful please accept answer.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.