Features container Speech to text

Question

Features container Speech to text

Fabio Puddu 25

Hello community,

I would like to ask you some info about Speech to text containers. I would like to run the service on premise and I don't find any information in the documentation in these services are available in the version on-prem as well:

Costs

Multi-agent service

Anonymization

mp3 input

Thank you in advance

Anonymous

2025-02-14T18:46:13.82+00:00

Hi Fabio Puddu

Just checking to see if the below response was helpful.

1 answer

Your answer

Anonymous

2025-02-14T18:46:13.82+00:00

Hi Fabio Puddu

Just checking to see if the below response was helpful.

Answer 1

Sina Salam 30,166 Volunteer Moderator

Hello Fabio Puddu,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that you would like to Azure AI services feature container Speech to Text in an on-premises.

Running Speech to Text containers on-premises is a great choice for maintaining control over your data. However, you need to put into consideration the followings as you requested in your question:

The cost of running Speech to Text containers on-premises depends on your usage and the resources allocated. For example, each decoder in batch processing mode can handle 2-3x real-time with two CPU cores. For more explanations: https://azure.microsoft.com/en-us/pricing/details/cognitive-services/speech-services
You can run multiple Speech to Text containers on the same host. This setup allows you to handle multiple requests simultaneously, which is useful for multi-agent services. - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-container-stt
The Speech to Text containers support various features, but specific details about anonymization capabilities are not explicitly mentioned in the documentation. You might need to implement additional layers of data processing to ensure anonymization. Check the link above for the same.
The containers can handle various audio formats, including MP3. You can use the docker run command to run the container and specify the audio input format. - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-container-stt

For more reading and more detailed guidance: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-container-faq and links provided above.

I hope this is helpful! Do not hesitate to let me know if you have any other questions.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Fabio Puddu 25 Reputation points

2025-02-13T15:55:22.24+00:00

Thank you for your answer, Multi agent I mean the detection of user and agent in a phone call, is this service available on-premise?
Sina Salam 30,166 Reputation points Volunteer Moderator

2025-02-13T16:16:50.1633333+00:00

Hi Fabio Puddu,

Thank you for requesting clarification.

Yes, multi-agent detection services for phone calls are available on-premise. For example, Aculab https://www.aculab.com/answering-machine-detection offers an AI-driven Answering Machine Detection (AI-AMD) solution that can classify whether a call is answered by a human or a voicemail. This service is available both on-premises and on their cloud platform.

However, Azure AI services, including those for call automation and user-agent detection, are typically cloud-based. You can integrate these services into your on-premises systems using Azure Communication Services and Azure OpenAI Service. - https://learn.microsoft.com/en-us/azure/communication-services/samples/call-automation-azure-openai-sample

Success
Fabio Puddu 25 Reputation points

2025-02-13T16:20:27.16+00:00

Is this service recognize two humans as well?
Sina Salam 30,166 Reputation points Volunteer Moderator

2025-02-13T16:23:37.0166667+00:00

Tone and accent are part of voice or speech detection.

Share via

Features container Speech to text

1 answer

Your answer