Hi there GenixPRO
Thanks for using QandA platform
Yes, your use case is achievable using a combination of Azure services. To enable real-time audio transcription along with model-generated responses (thinking/reasoning), you can pair gpt-4o-mini-transcribe
for speech-to-text and gpt-4o-realtime-preview
for reasoning and chat responses. The gpt-4o-mini-transcribe
model handles real-time transcription of user speech, while gpt-4o-realtime-preview
processes that text and provides intelligent responses in real-time similar to how ChatGPT with voice works. You can stream audio input to the transcription service and, once you get final or partial transcriptions, pass those to the GPT model and stream its response back into the chat UI. This setup gives you both live transcription and natural conversational replies from the model. GPT-4.1 isn’t optimized for real-time streaming, so gpt-4o-realtime-preview
is the better fit here.
If this helps kindly accept the answer thanks much,