How to slightly modify Azure STT(Speech to Text) to output filler words like "hmm" and "uh", etc?

Question

How to slightly modify Azure STT(Speech to Text) to output filler words like "hmm" and "uh", etc?

Fei Yu 21

We are using speech to text for azure and generally it worked well. However, in our application, we need to transcribe all text, including "hmm" and "uh", etc. The current behavior of Azure base model will ignore them. I searched the documentations and found that it seemed we could trained a customized model to change the behavior of the base ( default ) model. However, it seemed too heavy for me as we would need to construct a dataset and I am not quite sure about the effectiveness for this effort.

Is there an easier way to achieve my goal? Or if training a customized model is the only way, How much data is required for my customized model training for my goal?

Thanks!

1 answer

Your answer

Answer 1

Ramr-msft 17,826

@Fei Yu Thanks for the question. You can use the Phrase lists to improve the recognition. You don't need a large data set. Simply provide a word or phrase to boost its recognition.

Here is the document for Phrase list.

Fei Yu 21 Reputation points

2022-09-21T06:01:07.69+00:00

@Ramr-msft , thank you so much for your suggestions. It makes sense. Will give it a try.

Share via

How to slightly modify Azure STT(Speech to Text) to output filler words like "hmm" and "uh", etc?

1 answer

Your answer