How to slightly modify Azure STT(Speech to Text) to output filler words like "hmm" and "uh", etc?

Fei Yu 21 Reputation points
2022-09-20T06:26:15.307+00:00

We are using speech to text for azure and generally it worked well. However, in our application, we need to transcribe all text, including "hmm" and "uh", etc. The current behavior of Azure base model will ignore them. I searched the documentations and found that it seemed we could trained a customized model to change the behavior of the base ( default ) model. However, it seemed too heavy for me as we would need to construct a dataset and I am not quite sure about the effectiveness for this effort.

Is there an easier way to achieve my goal? Or if training a customized model is the only way, How much data is required for my customized model training for my goal?

Thanks!

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,780 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,736 Reputation points
    2022-09-21T03:05:10.97+00:00

    @Fei Yu Thanks for the question. You can use the Phrase lists to improve the recognition. You don't need a large data set. Simply provide a word or phrase to boost its recognition.

    Here is the document for Phrase list.

    1 person found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.