question

KoheiWatanabe-9972 avatar image
0 Votes"
KoheiWatanabe-9972 asked KoheiWatanabe-9972 answered

We would like to know noises effect to speech to text performance

Hello. We are using Azure Japanese speech to text.

We want to evaluate its performance.

What parameters affect the result, noises or microphones or intonations or etc....?



azure-speech
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

romungi-MSFT avatar image
1 Vote"
romungi-MSFT answered romungi-MSFT commented

@KoheiWatanabe-9972 Azure speech to text provides two options with respect to the models that are used behind the service.

  1. Baseline Model

  2. Custom Model

With the baseline model you can use the API directly without any customization where the model is trained by Microsoft against fairly decent background conditions. If your scenario involves recognition of speech in a day to day scenario or recordings this model should work for you right away with the API.

The custom model can be used in a scenario where the baseline model accuracy is not to your standards. For example, you have custom words in your speech like acronyms, phrases used in an organization, speech from factory floor with lot of background noises etc. This custom model is trained with your audio files on top of the baseline model so all the capabilities of baseline model are built in your resultant endpoint.

To summarize, the result from the service depends on your scenario and all the factors do effect them but if your scenario isn't for a custom background then you can use the baseline model rightaway and evaluate the performance. Please check the FAQ document that could help you with more details.


· 5
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thank you for your response!

I would like to use the custom model trained by both text and audio( or pronouciation),
but now just text is supported in Japanese.

https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support

Do you have a plan to support the Japanese custom model to train by both text and audio?

I would like to know the roadmap of Azure Speech-to-Text.

0 Votes 0 ·

@KoheiWatanabe-9972 You should be able to use Japanese for custom voice(TTS) and custom speech(STT) from the speech studio.
I am not sure if there is a limitation mentioned about this on the language support page.

You can view the announcements of Azure speech from the Azure updates page here.


1 Vote 1 ·

Thank you! I confirmed that the model can be trained by both text and audio in Japanese!

So this document seems to be wrong...
https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support

0 Votes 0 ·
Show more comments
KoheiWatanabe-9972 avatar image
0 Votes"
KoheiWatanabe-9972 answered

mmm, the model could be trained by both audio and text but just text seems to be used...

This UI is really confusing....

143308-%E3%82%B9%E3%82%AF%E3%83%AA%E3%83%BC%E3%83%B3%E3%82%B7%E3%83%A7%E3%83%83%E3%83%88-2021-10-25-170910.png



5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

KoheiWatanabe-9972 avatar image
0 Votes"
KoheiWatanabe-9972 answered
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.