We would like to know noises effect to speech to text performance

Question

Hello. We are using Azure Japanese speech to text.

We want to evaluate its performance.

What parameters affect the result, noises or microphones or intonations or etc....?

Accepted Answer

@Kohei Watanabe Azure speech to text provides two options with respect to the models that are used behind the service.

Baseline Model
Custom Model

With the baseline model you can use the API directly without any customization where the model is trained by Microsoft against fairly decent background conditions. If your scenario involves recognition of speech in a day to day scenario or recordings this model should work for you right away with the API.

The custom model can be used in a scenario where the baseline model accuracy is not to your standards. For example, you have custom words in your speech like acronyms, phrases used in an organization, speech from factory floor with lot of background noises etc. This custom model is trained with your audio files on top of the baseline model so all the capabilities of baseline model are built in your resultant endpoint.

To summarize, the result from the service depends on your scenario and all the factors do effect them but if your scenario isn't for a custom background then you can use the baseline model rightaway and evaluate the performance. Please check the FAQ document that could help you with more details.

Answer

mmm, the model could be trained by both audio and text but just text seems to be used...

This UI is really confusing....

143308-%E3%82%B9%E3%82%AF%E3%83%AA%E3%83%BC%E3%83%B3%E3%82%B7%E3%83%A7%E3%83%83%E3%83%88-2021-10-25-170910.png

Answer

I asked again https://learn.microsoft.com/en-us/answers/questions/602542/is-the-feature-of-training-the-custom-speech-model.html

We would like to know noises effect to speech to text performance

2 additional answers