How to generate real-time audio transcription that includes model response (thinking/reasoning) as well

GenixPRO 111 Reputation points
2025-06-06T22:29:56.86+00:00

Hi.

Our use case is a:

  1. A conversational chat interface where user speaks with the model and real-time transcription is displayed in the chat window.
  2. The chat includes model response as well i.e. thinking/reasoning (not merely a transcription)
  3. What model would be suitable for this? Does real-time API (gpt-4o-realtime-preview) support this use case? Does it need to be paired with gpt-4o-mini-transcribe? How can we get this model combination to work? Is there a better way? Can GPT4.1 be used for this?
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,580 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Azar 29,520 Reputation points MVP Volunteer Moderator
    2025-06-07T09:41:17.3666667+00:00

    Hi there GenixPRO

    Thanks for using QandA platform

    Yes, your use case is achievable using a combination of Azure services. To enable real-time audio transcription along with model-generated responses (thinking/reasoning), you can pair gpt-4o-mini-transcribe for speech-to-text and gpt-4o-realtime-preview for reasoning and chat responses. The gpt-4o-mini-transcribe model handles real-time transcription of user speech, while gpt-4o-realtime-preview processes that text and provides intelligent responses in real-time similar to how ChatGPT with voice works. You can stream audio input to the transcription service and, once you get final or partial transcriptions, pass those to the GPT model and stream its response back into the chat UI. This setup gives you both live transcription and natural conversational replies from the model. GPT-4.1 isn’t optimized for real-time streaming, so gpt-4o-realtime-preview is the better fit here.

    If this helps kindly accept the answer thanks much,


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.