Why my TTS is suddenly become bad? Speed & punctuation isn't working properly.
This morning I tried to work on my TTS file using Brian's voice. But once I listened to the speech, the punctuation & speed weren't working properly. Also, it seems that his voice became monotone. I've tried with an already-finished project to see if…
azure prononciation assessment input video
can i give to azure prononciation assessment a video input ?
Azure Real Time Speech To Text fails to take input from Blob URL
I have implemented Azure Real Time Speech to Text using Speech SDK in Python for pre recorded audio files. It works fine when the input audio is on my machine. But fails when I give the input as the Blob url containing the audio. Please help!
Include custom audio files for keyword recognition training process
I am leveraging Azure Keyword Recognition service, it works pretty nice except some false wakeup. We've collected a bunch of false waking up audio files, and I was wondering whether there is some approach that we can include these false audio files into…
Azure Speech AI service Custom Commands Alternative
Hi, we are looking forward to using Azure AI services especially the speech service to build a bot that does certain tasks based on speech for example if we ask the bot to "Make a reservation for Instrument A from 9 AM to 10 AM" then the bot…
When will more avatar's be available?
The Text to Speech Avatar has been in preview for about six months. Any idea when a full release will be done? And what will be in that release? additional avatars adjustable clothing, hair, skin tone, ... ??? Thx
关于Azure AI Speech “zh-CN-XiaochenNeural” 音色异常
Since early April, the tone of the "Xiaochen" model has been experiencing abnormalities. At that time, attempts were made in regions such as East Asia, Southeast Asia, and the East US, all of which showed abnormalities, except for the Japan…
How to get spoken Language in audio file with Azure Speech sdk in C#?
Hi, I need to detect what's spoken language in an audio file. I have already read the documentation about language identification for speech service but in the SpeechRecognitionResult object result I don't have the recognized language code. Is there a…
azure prononciation assessment async assessment
i'am using azure speech recognizer sdk , to do the prononciation assessment of an audio file. the problem when the speech is in french the results are always low , and no expressive const language = await detectSingleSpeechLanguage(text) …
SSML: Using <lang xml:lang=""> within a multilingual voice sounds incorrect / unlike when used with the language-specific voice
I am developing a TTS application that pronounces "nonsense words" with specific language pronunciations. For example, I am using Polish language voices to pronounce non-Polish words. If I use a Polish-specific language, I hear what I expect…
Retirement Announcement - Upgrade to Text-to-Speech Neural Voice on 31 August 2024
Text-to-Speech currently supports both standard and neural voices. However, since the neural voices provide more natural sounding speech output, and thus, a better end-user experience, we are retiring the standard voices on 31st August 2024 and they will…
Error while trying to train a 202240228 Whisper Large v2 baseline model
When trying to train a custom speech model using a dataset containing an audio file and its transcript, the model failed to train due to an internal error. Can anyone provide any insights on how to troubleshoot this issue?
Is there any way to dub audios maintaining its original intonation, breaks and speed?
I've a voice audio that has a lot of deeper and higher tones and some breaks and "word-emphasis" in specific moments, but, when using the "Speech Translation" functionality, this audio loses all of its life (all this complexity),…
Not able to use Azure AI Speech Avatar on ReactJs
Hello, I am trying to implement Live chat avatar using ReactJS in my application. When implementing the sample code, I am getting the following console logs: is TURN server active? yes Avatar started. Speech and avatar synthesized to video…
Azure Text to Speech F0 (Free) Tier Limits
Hi, I have the F0 (Free) Tier. I send a request to TTS service and get the blendshape data and voice. When I make a request, the first 4 get a response. The 5th one does not return a response anymore. If i restart my server, I can make another 4 request…
Speech Recognition Live transcription not detecting any other language instead of English
Hi, I am using Speech Recognition resource in my application for live transcription. It's perfectly going with English language but when I am trying to say in Hindi then it's not detecting. I want to create my application for multiple languages used in…
zh-CN-XiaochenNeural Abnormal timbre
zh-CN-XiaochenNeural, abnormal timbre. The same problem occurred in October last year. https://learn.microsoft.com/en-us/answers/questions/1431823/the-timbre-of-the-voice-of-zh-cn-xiaochenneural-ha —————————————————————— How long will it take to recover…
Is it possible to stream Groq LLM responses as and when I get it into Azure TTS?
Hi! I'm trying to build a real time LLM conversation bot, and need it to be as low latency as possible. I have successfully set up TTS audio output streaming…
How to have multiple mstts:audioduration in a single <speak>?
I'm trying to adjust the duration of individual phrases so that the synthesized voice matches with the voice in the original audio. It's working perfectly when done like this: <speak xmlns="http://www.w3.org/2001/10/synthesis"…
Do Text to Speech containers TTS provide visemes and blendshapes like the API?
I'm currently using the Speech API and consuming the visemes and blendshapes that are returned. In an effort to reduce latency I would like to run the speech services locally via the text to speech container. Does the response of the container STT…