Edit

Audio capabilities in Azure OpenAI in Microsoft Foundry Models (classic)

Applies only to: Foundry (classic) portal. This article isn't available for the new Foundry portal. Learn more about the new portal.

Note

Links in this article might open content in the new Microsoft Foundry documentation instead of the Foundry (classic) documentation you're viewing now.

Audio models in Azure OpenAI are available via the realtime, completions, and audio APIs, and support speech recognition, translation, and text to speech.

For information about the available audio models per region in Azure OpenAI, see the audio models, standard models by endpoint, and global standard model availability documentation.

Important

The content filtering system isn't applied to prompts and completions processed by audio models in Azure OpenAI, such as Whisper.

GPT-4o audio Realtime API

GPT real-time audio supports real-time, low-latency conversational interactions for scenarios that require responsive bidirectional audio exchange. For more information on how to use GPT real-time audio, see the GPT real-time audio quickstart and how to use GPT-4o audio.

GPT-4o audio completions

GPT-4o audio completion generates audio outputs from audio or text prompts. The GPT-4o audio completions model introduces the audio modality into the existing /chat/completions API. For more information on how to use GPT-4o audio completions, see the audio generation quickstart.

Audio API

The audio models via the /audio API can be used for speech to text, translation, and text to speech. To get started with the audio API, see the Whisper quickstart for speech to text.

Note

To help you decide whether to use Azure Speech in Foundry Tools or Azure OpenAI, see the Azure Speech batch transcription, What is the Whisper model?, and OpenAI text to speech voices guides.