question

Alex-3873 avatar image
0 Votes"
Alex-3873 asked Alex-3873 commented

Azure Speech-to-text. Missing a base model with support for custom model training with audio

Speech-to-text
In the docs en-US is listed as a language that supports custom model training with audio data.
198484-s1.png

But in the Speech Studio there is no base models with audio data training support. Where did they go?!
198492-s2.png


azure-speech
s1.png (60.4 KiB)
s2.png (70.1 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

YutongTie-MSFT avatar image
0 Votes"
YutongTie-MSFT answered Alex-3873 commented

@Alex-3873 Sorry for the misunderstanding and unclear document, this table you shared is not about the baseline model, it's about the lanauage supports for each feature. I know the description is confused, I will contact the content author to modify it.

The table you shared is related to input data and data type as below screenshots, this means, for languages in this table, the audio data input is supported:
198576-image.png

In the studio, it reflect below:

198594-image.png

Sorry again for the misunderstanding.

I hope this helps.

Regards,
Yutong

-Please kindly accept the answer if you feel helpful, thanks a lot.


image.png (72.5 KiB)
image.png (193.6 KiB)
· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thanks for the reply, Yutong.

The problem is I can create a dataset of type "audio + human-labeled transcripts", but I don't see any base models in the list that support training using audio.

See below screenshot from the Add audio with human-labeled transcripts

198695-s3.png

Note highlighted both links point to the same page as I mention in my original question.

I've tried various regions but I can't see any base models that support training with audio. Last time it did work was middle of March.
I wonder if I'm missing something.


Regards
Alex


0 Votes 0 ·
s3.png (109.2 KiB)

Yes @Alex-3873

There are region limilations but I have seen the same issue as well, let me double check with product team to see what it should be and also content team to update the document with more clear description. Sorry about the experience.
198713-image.png

I will update later once I get any response from them. Thanks for the understanding.

Regards,
Yutong


1 Vote 1 ·
image.png (75.7 KiB)

I see those baseline models with support for audio training data are back now.
Thanks for you help!

1 Vote 1 ·

Hi Yutong.

Thanks for looking into it.
Are there any updates from the product team? Thanks.

Regards
Alex

0 Votes 0 ·