Share via

Azure IVR Message Recording Capability Questions

Lester, Gregory (VEC) 0 Reputation points
2026-05-14T20:37:49.71+00:00

We are a state‑level agency in the Commonwealth of Virginia currently evaluating enterprise‑grade Voice‑to‑Text, Text‑to‑Speech, and Speech Translation capabilities. Our existing environment is built on AWS, but we are exploring cross‑cloud options to ensure we have the most accurate, secure, and scalable solution for our operational needs.

Our primary requirement is the ability to generate and produce IVR audio prompts, with output delivered in WAV format for integration into our telephony infrastructure. In addition, we are assessing speech transcription and multilingual translation capabilities to support accessibility, public‑facing services, and compliance mandates.

We would be interested in any demos, sandbox environments, or guided evaluations that Microsoft can provide. We attempted to set up a temporary Azure account for testing but ran into issues during the onboarding process. Any assistance or direction on how to properly access a trial environment would be greatly appreciated. Thank you for your time.

Azure Speech in Foundry Tools

3 answers

Sort by: Most helpful
  1. Anshika Varshney 12,115 Reputation points Microsoft External Staff Moderator
    2026-05-19T09:00:41.8366667+00:00

    Hi Lester, Gregory (VEC),

    Thanks for sharing the details. I understand you are still facing the issue even after trying the earlier steps.

    Let’s go a bit deeper and check a few important areas that usually cause this kind of problem.

    From your scenario, this looks like a backend or configuration level issue, most commonly related to permissions, storage setup, or project configuration in Azure AI Foundry.

    Please go through the below checks one by one:

    First, check storage account connection Evaluation in Foundry requires a storage account to be linked with the project. If the storage is not connected properly, the evaluation cannot read or write data and can fail silently or with errors. You can refer here for details: Evaluation troubleshooting guide [learn.microsoft.com]

    Next, verify role permissions Even if you have Contributor or Owner, evaluation features need specific permissions. The Foundry project identity or your user should have proper access like Storage Blob Data Contributor on the storage account. If this role is missing, it can result in 403 or forbidden type errors.

    Check if the deployment and project are in healthy state Make sure:

    • Model deployment is active and not in failed or bad state
    • Project does not have any backend errors Sometimes deployments fail due to quota, region capacity, or dependency issues. [azureossd.github.io]

    Validate region and quota Some Foundry features depend on region availability and quota. If the region has limited capacity or quota is not assigned, operations may fail even if configuration looks correct. [opentools.ai]

    Check for permission or access related blocking. A 403 error usually means access is being blocked either by:

    • Missing RBAC role
    • Conditional access or policy restriction
    • Network or firewall settings

    In many cases, even if setup looks correct, missing data plane permissions or policies can block the action. [learn.microsoft.com]

    Try one quick isolation step. Create a brand-new project with:

    • New storage connection
    • Small test dataset
    • Same region

    If the issue still happens, it helps confirm that the issue is not with your configuration but something outside your control.

    If you can share whether your storage is connected and what roles are assigned, it will help narrow this down faster.
    I Hope this helps. Do let me know if you have any further queries.

    Thankyou!

    Was this answer helpful?

    0 comments No comments

  2. kagiyama yutaka 3,430 Reputation points
    2026-05-15T12:44:34.2533333+00:00

    I think that the quickest safe path is make the IVR mono‑16k WAV in Speech Studio and open a support ticket when the trial blocks u, and ask your MS account rep for a temp eval as the fallback.

    Was this answer helpful?

    0 comments No comments

  3. AI answer

    2026-05-14T20:38:19.72+00:00

    Azure Speech and Azure Communication Services together provide the capabilities described, including IVR prompt generation in WAV, speech transcription, and multilingual translation, with options suitable for government and cross‑cloud scenarios.

    1. IVR audio prompt generation (WAV output)
    • Azure Communication Services (ACS) Call Automation supports playing audio prompts in calls using:
      • Pre‑recorded WAV audio files, or
      • Text that is converted to speech via integration with Foundry Tools and Azure Text‑to‑Speech.
    • Supported file formats for ACS play actions include:
      • WAV files formatted as 16‑bit PCM, mono, 16 kHz.
      • MP3 files with ID3V2TAG (for the play action described in the ACS Call Automation docs).
    • For IVR scenarios, prompts can be created using Azure Text‑to‑Speech neural voices and exported as WAV files. The documentation notes that WAV files must be mono channel, 16 kHz for compatibility.
    • The Speech synthesis Audio Content Creation tool can be used to create compliant WAV files for IVR prompts.
    1. Voice‑to‑Text (speech transcription)
    • Azure Speech provides speech‑to‑text capabilities, including:
      • Real‑time transcription.
      • Batch transcription.
      • Language identification.
      • Speaker diarization.
      • Custom speech (for domain‑specific tuning).
    • For call‑center and telephony scenarios, integration is typically done via a telephony client connected to a SIP/RTP processor (for example, a Session Border Controller). The client:
      • Handles incoming audio streams.
      • Converts them to PCM.
      • Connects streams to Azure Speech using continuous recognition.
      • Routes transcripts for downstream processing (for example, Agent Assist or virtual agents).
    • Azure Speech also supports “ALAW in WAV container” and “MULAW in WAV container” for audio streaming, simplifying integration with telephony systems.
    1. Text‑to‑Speech and neural voices
    • Azure Text‑to‑Speech offers:
      • Prebuilt neural voices (over 100 voices) for natural, human‑like IVR and bot experiences.
      • Support for SSML to control prosody, pronunciation, and audio playback.
    • These neural voices can be used directly in ACS Call Automation via the integration with Foundry Tools to play personalized responses, or to generate WAV files for use in existing IVR systems.
    1. Speech translation and multilingual support
    • Azure Speech supports speech translation for real‑time, multilingual translation of speech:
      • Speech‑to‑speech and speech‑to‑text translation scenarios are supported.
      • Language identification can be used alongside speech‑to‑text or translation to detect the spoken language.
    • LLM speech (preview) provides LLM‑enhanced transcription and translation for pre‑recorded audio, with use cases such as captions, meeting notes, and call‑center support.
    1. Government and data residency considerations
    • For US government entities and partners, Azure Speech is available in Azure Government (for example, US Gov Virginia and US Gov Arizona regions), with:
      • Speech‑to‑text (real‑time and batch, language ID, speaker diarization, custom speech).
      • Text‑to‑speech (standard and neural voices).
      • Speech translation.
    • Some advanced features (for example, LLM speech, fast transcription, pronunciation assessment, custom voice, personal voice, text‑to‑speech avatar) are not currently supported in Azure Government.
    • Speech containers can be used to deploy Speech services closer to data for compliance and security reasons.
    1. Responsible use and privacy
    • When using speech‑to‑text and related capabilities, it is necessary to:
      • Comply with applicable privacy, eavesdropping, and wiretap laws.
      • Collect and process only audio within the reasonable expectations of users.
      • Obtain necessary and appropriate consents for collecting, processing, and storing audio data.
    • Speech‑to‑text is not intended for covert audio surveillance or use in locations where users have a reasonable expectation of privacy.
    1. Telephony and IVR integration patterns
    • For call‑center and IVR scenarios:
      • A telephony client integrates the SIP/RTP audio with Azure Speech for continuous recognition.
      • Results can be used for:
        • Agent Assist (analyzing transcripts).
        • Virtual agents (for example, via Azure Bot Framework or Power Virtual Agent).
    • Azure Communication Services Call Automation provides:
      • Telephony integration and real‑time event triggers.
      • The ability to play prompts (WAV or TTS), recognize conversational voice inputs, and orchestrate AI‑powered flows via Foundry Tools.
    1. Trials, demos, and sandbox options
    • For assessment and evaluation, typical options include:
      • Azure Free Account / trial subscription for testing Speech and ACS capabilities.
      • Microsoft Learn sandbox environments for short‑lived, free subscriptions suitable for hands‑on evaluation.
    • If issues occur during onboarding or trial setup, one approach (from similar Q&A guidance) is:
      • Create a trial Microsoft 365 or Azure tenant to gain admin access.
      • Use the admin center to open a support ticket and request assistance from the appropriate support or Data Protection team.
    1. IVR with Microsoft Copilot Studio and Omnichannel (optional pattern)
    • For organizations using Dynamics 365 Omnichannel for Customer Service, Microsoft Copilot Studio offers native voice authoring for IVR‑like bots, including:
      • DTMF input handling.
      • Silence detection and reprompts.
      • Barge‑in control.
      • SSML support and audio file playback.
      • Enhanced speech recognition based on bot content.
    • This can be combined with Azure Speech and ACS for a full IVR and contact‑center solution.

    References:

    AI-generated content may be incorrect. Read our transparency notes for more information.

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.