@Fei Yu Thanks for the question. You can use the Phrase lists to improve the recognition. You don't need a large data set. Simply provide a word or phrase to boost its recognition.
Here is the document for Phrase list.
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
We are using speech to text for azure and generally it worked well. However, in our application, we need to transcribe all text, including "hmm" and "uh", etc. The current behavior of Azure base model will ignore them. I searched the documentations and found that it seemed we could trained a customized model to change the behavior of the base ( default ) model. However, it seemed too heavy for me as we would need to construct a dataset and I am not quite sure about the effectiveness for this effort.
Is there an easier way to achieve my goal? Or if training a customized model is the only way, How much data is required for my customized model training for my goal?
Thanks!