Does Azure speech allow low latency input streaming? In other words, can we listen to LLMs in real-time as the text is being generated.

Question

Does Azure speech allow low latency input streaming? In other words, can we listen to LLMs in real-time as the text is being generated.

Nitish Kumar 50

Hello,

Azure OpenAI can generate chunk of text in stream without waiting for full response.

Recently, ElevanLab API allow low latency input streaming

dupammi 8,615 Reputation points Microsoft External Staff

2023-10-08T07:58:19.3066667+00:00

Hi @Nitish Kumar ,

Thanks for using Microsoft Q&A Platform.

Please allow sometime will check internally and get back to you on this.
dupammi 8,615 Reputation points Microsoft External Staff

2023-10-10T10:28:55.8133333+00:00

Hi @Nitish Kumar ,

Following up to see if the above suggestion was helpful. And, if you have any further query do let us know.
Nitish Kumar 50 Reputation points

2023-10-11T11:15:53.5766667+00:00

Hi @dupammi , Thank you for the quick response. Will check it and keep you updated.

Accepted answer

1 additional answer

Your answer

dupammi 8,615 Reputation points Microsoft External Staff

2023-10-08T07:58:19.3066667+00:00

Hi @Nitish Kumar ,

Thanks for using Microsoft Q&A Platform.

Please allow sometime will check internally and get back to you on this.
dupammi 8,615 Reputation points Microsoft External Staff

2023-10-10T10:28:55.8133333+00:00

Hi @Nitish Kumar ,

Following up to see if the above suggestion was helpful. And, if you have any further query do let us know.
Nitish Kumar 50 Reputation points

2023-10-11T11:15:53.5766667+00:00

Hi @dupammi , Thank you for the quick response. Will check it and keep you updated.

Answer 1

Hi @Nitish Kumar ,

Following up to see my above "comment" answer helps by checking the comments section of this thread. Do let us know if you have any queries.

To reiterate the resolution here, let me jot down the gist of my comment answer above.

Yes, it is possible that the Azure OpenAI's GPT-3 can generate text in chunks without waiting for the full response, allowing for a more interactive and real-time conversation.

Please have a look at the sample implementation done in the above "comment".

Please 'Accept as answer' and ‘Upvote’ if it helped so that it can help others in the community looking for help on similar topics. Thank you!

Answer 2

Hi @Nitish Kumar ,

Thank you for your question about the Azure speech SDK to allow low latency input streaming, with which, we can listen to LLMs in real-time as the text is being generated. I will be happy to assist you regarding this.

Regarding your query, it is possible that the Azure OpenAI's GPT-3 can generate text in chunks without waiting for the full response, allowing for a more interactive and real-time conversation.

To achieve real-time speech synthesis with continuous chunks of text using the Azure Text-to-Speech API, you would need to implement streaming yourself.

Here's a general python approach you can follow using Azure OpenAI and Azure Speech SDK:

import os
import openai
import azure.cognitiveservices.speech as speechsdk

# Set up OpenAI and Azure Text-to-Speech configuration (as you've already done)
openai.api_type = "azure"
openai.api_base = "OPENAI_API_BASE"
openai.api_version = "2023-07-01-preview"
openai.api_key = "OPENAI_API_KEY" 
#os.getenv("OPENAI_API_KEY")
def generate_chat_completion(prompt):
    # Generate text chunk using OpenAI GPT-3
    response = openai.ChatCompletion.create(
        engine="test_Chatgpt",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.7,
        max_tokens=800,
        top_p=0.95,
        frequency_penalty=0,
        presence_penalty=0,
        stop=None
    )
    return response.choices[0].message["content"]

def synthesize_and_stream(text_chunks):
    speech_config = speechsdk.SpeechConfig(subscription="YOUR_AZURE_SUBSCRIPTION_KEY", region="YOUR_AZURE_SUBSCRIPTION_REGION")
    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)

    for chunk in text_chunks:
        result = speech_synthesizer.speak_text_async(chunk).get()

        if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
            print("Speech synthesized for text [{}]".format(chunk))
# Handle the audio output here (e.g., play or save to a file)
        elif result.reason == speechsdk.ResultReason.Canceled:
            cancellation_details = result.cancellation_details
            print("Speech synthesis canceled: {}".format(cancellation_details.reason))
            if cancellation_details.reason == speechsdk.CancellationReason.Error:
                if cancellation_details.error_details:
                    print("Error details: {}".format(cancellation_details.error_details))
            print("Did you update the subscription info?")

# Example usage
user_input = "A long text relaxing speech, that needs to be synthesized in chunks"
chunk_size = 200  # Define the size of each text chunk
text_chunks = [user_input[i:i + chunk_size] for i in range(0, len(user_input), chunk_size)]

# Generate chat completion
for chunk in text_chunks:
    chat_response = generate_chat_completion(chunk)

    # Synthesize speech for the chat response
    synthesize_and_stream([chat_response])

Please replace the placeholders in above code with your actual Azure Text-to-Speech subscription key etc. This code breaks the input text into chunks and synthesizes each chunk individually, allowing you to manage the streaming of continuous text effectively.

Keep in mind that this is a simplified example that was tried at my end based on documentation, above python code snippet. You may need to fine-tune it based on your specific requirements and integration with your application.

Here is the step-by-step explanation of above Python implementation:

Break your text into smaller chunks: Divide the text you want to synthesize into smaller, manageable chunks.
Send each chunk for synthesis: Send each chunk of text to the Text-to-Speech API for synthesis. You would typically use the speak_text_async method for each chunk.
Handle the audio output: As each chunk is synthesized, you will receive audio output. You can then play this audio output or save it to a file, depending on your application's requirements.
Manage latency: To achieve low latency, you will need to ensure that you start processing the next chunk of text while the previous one is being synthesized. This way, you can achieve a more real-time experience.

Below is the output I got at my end, by following the steps mentioned in the documentation link below.

Note: The output is as per text stream passed as input. Voice was also fine at my end for below text stream. For some reason, I couldn't attach the audio file here. I hope you understand.

Output User's image

Please have a look into the below documentation for more details:

How to lower speech synthesis latency using Speech SDK - Azure AI services | Microsoft Learn

I hope this information helps! Check and let me know, how it works.

Share via

Does Azure speech allow low latency input streaming? In other words, can we listen to LLMs in real-time as the text is being generated.

1 additional answer

Your answer