How to enable Real-Time Avatar Streaming API in Azure Speech service?

Ananth Hegde 0 Reputation points
2025-06-30T18:18:04.78+00:00

I would like to use the Azure Neural TTS Real-Time Avatar Streaming API

I already have a Speech resource created in a supported region (e.g., East US), but I understand that access to avatar streaming is currently gated and requires approval.

Could you please guide me through the steps to:

Enable the Avatar preview feature in my Azure subscription

Get access to use the real-time avatar streaming endpoint

Confirm any regions, voices, or SKUs required for this feature

I have already reviewed the official WebSocket sample and would like to test the avatar functionality using the real-time TTS + viseme stream.

Please let me know what approvals or configuration steps are required in the Azure portal to proceed.

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,078 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 34,651 Reputation points Volunteer Moderator
    2025-07-01T20:34:46.19+00:00

    Hello Ananth !

    Thank you for posting on Microsoft Learn.

    This functionality is in preview and not available by default. You’ll need to:

    • Apply for access via Microsoft. This usually involves filling out a preview request form or contacting your Microsoft AI or Azure account representative.
    • Once approved, navigate to your Speech resource in the Azure portal.
    • Under Preview features (or similar), look for Text-to-Speech Avatars or Avatar real-time streaming, and switch it ON.

    After approval and enabling preview:

    1. Verify that you’re using the Speech SDK v1.40+ or later
    2. Use regions that support avatars, and your Speech resource S0 tier
    3. In code, create a WebSocket or WebRTC connection using the real-time avatar endpoint. Microsoft provides detailed samples for JavaScript, Python, C# + JS mobile clients, showing how to consume both live video and viseme (lip-sync) streams.

    Links to help you :

    https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/real-time-synthesis-avatar

    https://azureaggregator.wordpress.com/2024/06/28/make-your-voice-chatbots-more-engaging-with-new-text-to-speech-features

    https://www.youtube.com/watch?v=JUIs063K6z

    Your resource should be located in one of these regions and uses S0 (Standard) tier.

    • Southeast Asia
    • North Europe
    • West Europe
    • Sweden Central
    • South Central US
    • East US 2
    • West US 2

    Supported voices include standard neural voices and other neural TTS voices listed in the region voice catalog).

    https://learn.microsoft.com/en-us/azure/ai-services/speech-service/text-to-speech-avatar/real-time-synthesis-avatar

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.