Share via

Azure Communication Services Call Automation – Which Cognitive Service is required for STT/TTS (Speech)? Can Azure OpenAI or Foundry be used?

Kynyk, Yaroslav 40 Reputation points
2026-04-16T11:00:33.2333333+00:00

Hello Azure Community,

I am building a telephony-based AI bot using Azure Communication Services (ACS) Call Automation with a phone number. The bot logic runs in my backend (Python, azure.communication.callautomation SDK) and generates text responses using Azure OpenAI.

In the Azure Portal, ACS provides a “Cognitive Services” configuration where it is possible to associate different Azure AI resources, including Azure OpenAI and Azure AI Foundry, as a Cognitive Service.

In my backend, I start the outbound call like this:

call_result = client.create_call(

    target_participant=call_invite,

    source_caller_id_number=source_caller_id,

    callback_url=config.ACS_CALLBACK_URL,

    cognitive_services_endpoint=config.ACS_COGNITIVE_ENDPOINT,

)

The call connects successfully, and my AI logic generates text correctly. However, Text‑to‑Speech (TTS) and Speech‑to‑Text (STT) fail, and ACS raises a PlayFailed event with a Cognitive Services authentication error (401).

My questions

Which Azure service is actually required for STT/TTS when using ACS Call Automation? Is ACS Call Automation hard‑wired to use Azure Speech (Azure AI Services – Speech), regardless of what other AI services are connected?

Can Azure OpenAI or Azure AI Foundry ever be used as the cognitive_services_endpoint for ACS Call Automation speech (TTS/STT)? Or are these services supported only for text reasoning in backend logic, but not for audio synthesis or recognition?

If Azure OpenAI / Foundry cannot be used for STT/TTS, is the correct and supported setup to associate either:

  • an Azure Speech resource, or
    • an Azure AI Services (multi‑service) resource that includes Speech with ACS, and use that endpoint for Call Automation?

The Azure Portal UI allows Azure OpenAI and Foundry to be selected as Cognitive Services for ACS, which is somewhat confusing, so I would appreciate a clear explanation of what is supported at runtime versus what is only a resource‑level association.

Thank you in advance for the clarification.

Azure Communication Services
0 comments No comments

Answer accepted by question author

  1. Golla Venkata Pavani 4,500 Reputation points Microsoft External Staff Moderator
    2026-04-16T17:12:05.99+00:00

    Hi @Kynyk, Yaroslav

    Thank you for reaching us regarding the issue.

    Azure Communication Services (ACS) Call Automation provides Speech‑to‑Text (STT) and Text‑to‑Speech (TTS) through its integration with Foundry Tools, and this integration is supported only when ACS is connected to a Multi‑service Azure AI Services (Cognitive Services) resource that includes Speech

    How speech works in ACS Call Automation

    • ACS Call Automation exposes AI‑powered features such as:
      • Text‑to‑Speech (TTS) using plain text or SSML
      • Speech‑to‑Text (STT) for recognizing caller speech
    • These features are implemented via the Foundry Tools integration, which internally relies on the Azure AI Speech capability
    • The integration only supports a Multi‑service Cognitive Service resource When creating or connecting an Azure AI resource for ACS, Microsoft explicitly recommends using a Multi‑service Cognitive Service resource and ensuring that Speech is included and enabled

    Through this setup, ACS Call Automation can:

    • Convert text or SSML into audio using Azure Text‑to‑Speech voices
    • Recognize spoken responses using Azure Speech‑to‑Text
    • Execute these capabilities via built‑in Play and Recognize actions without developers handling media streams directly

    Role of Azure OpenAI and Azure AI Foundry

    • Azure OpenAI and Azure AI Foundry models are not documented as supported speech backends for ACS Call Automation’s built‑in STT/TTS.
    • These services are commonly used in sample architectures for:
      • Text reasoning
      • Natural language understanding
      • Generating responses after speech has already been converted to text
    • However, the speech layer itself (TTS/STT) is always provided by the Speech capability within the connected Multi‑service Cognitive Service resource, accessed through Foundry Tools.

    Correct and supported configuration

    For STT and TTS to work correctly with ACS Call Automation, the supported and documented setup is:

    • Connect ACS to an Azure AI Services – Multi‑service Cognitive Service resource that includes Speech
    • Use that resource as the Cognitive Services endpoint for Call Automation
    • Use Azure OpenAI separately in your backend logic if needed for text generation or decision‑making

    This design cleanly separates responsibilities:

    • ACS Call Automation + Foundry Tools > handles telephony, audio, STT, and TTS using Azure Speech
    • Azure OpenAI / Foundry models > handle text‑based AI reasoning and response generation

    Reference:
    https://learn.microsoft.com/en-us/azure/communication-services/samples/call-automation-ai?pivots=programming-language-javascript
    https://learn.microsoft.com/en-us/azure/communication-services/concepts/call-automation/azure-communication-services-azure-cognitive-services-integration
    If the answer is helpful,  Please do click "Accept the answer” and Yes, this can be beneficial to other community members.If you have any other questions, let me know in the "comments" and I would be happy to help you

    1 person found this answer helpful.
    0 comments No comments

Answer accepted by question author

  1. Q&A Assist
    2026-04-16T11:01:00.5633333+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    Azure Communication Services (ACS) Call Automation uses Azure AI Speech capabilities through the integration with Foundry Tools, and this integration is currently supported only when the connected Cognitive Service is a Multi‑service Cognitive Service resource that includes Speech.

    From the documented integration:

    • Call Automation provides AI-powered features such as text-to-speech (TTS) and speech-to-text (STT) via the Foundry Tools integration.
    • This integration “only supports Multi-service Cognitive Service resource”; when creating or connecting an Azure AI resource, the recommendation is to create or select a Multi-service Cognitive Service resource and ensure it is the one connected to ACS.
    • Through this connection, ACS can:
      • Use Text-to-Speech and SSML to play prompts to callers.
      • Use Speech-to-Text to recognize spoken responses.
    • The Play API in Call Automation can:
      • Play pre-recorded WAV files, or
      • Play regular text converted to speech via the Foundry Tools integration, which uses Azure Text-to-Speech voices.

    Therefore, for STT/TTS with ACS Call Automation:

    1. The required and supported Cognitive Service for speech at runtime is a Multi-service Cognitive Service resource that includes Speech (effectively the Azure AI Services Speech capability). ACS Call Automation is designed to use this Speech capability via the Foundry Tools integration.
    2. Azure OpenAI and Azure AI Foundry resources themselves are not documented as valid speech backends for ACS Call Automation’s built-in TTS/STT. They are used in the sample architecture for text reasoning (for example, generating responses from recognized text) while ACS handles speech via the Foundry Tools + Speech integration.
    3. The correct supported setup for ACS Call Automation speech is:
      • Connect ACS to a Multi-service Cognitive Service resource (Azure AI Services) that includes Speech, and
      • Use that resource as the Cognitive Service endpoint for Call Automation so that Play/Recognize actions can perform TTS/STT.

    Azure OpenAI can still be used in the backend logic (for example, as in the Call Automation OpenAI samples) to generate text responses, but ACS’s speech layer (TTS/STT) is provided by the Speech capability in the connected Multi-service Cognitive Service resource, not by Azure OpenAI itself.


    References:

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.