Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
GPT Realtime 2 is a speech-to-speech model with built-in reasoning. It accepts audio input and produces audio output. It's designed for low-latency, interactive voice experiences where you need stronger instruction following and reasoning than earlier realtime models.
Note
This feature is currently in public preview. This preview is provided without a service-level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
What's new in GPT Realtime 2
- Reasoning support with an adjustable
reasoning.effortcontrol. - Response phases that distinguish preambles ("commentary") from the final answer ("final_answer").
- Longer context window (256,000 tokens).
Key concepts
Reasoning effort
Control reasoning intensity with the reasoning.effort session parameter. Valid values are minimal, low, medium, and high.
Preambles and response phases
Realtime responses can include multiple output items per turn. Each item has a phase that indicates its role:
| Phase | Description |
|---|---|
commentary |
A promptable preamble, often used before longer reasoning. |
final_answer |
The final answer after the model completes reasoning. |
Preambles can reduce perceived latency—for example, "Let me think about that…"—and can also be used for tool announcements or silence fillers. If the model is interrupted during thinking, it discards the current chain of thought and starts a new turn.
Instruction following
Instruction following is stricter than in earlier realtime models. If your system prompt contains narrow wording (for example, distinguishing "order ID" from "confirmation code"), you might need to broaden or rephrase instructions to match real user phrasing.
Get started
The connection and usage patterns for GPT Realtime 2 are the same as for earlier versions—just deploy the new model and point your existing code at it. Choose the transport that fits your scenario: