Custom Speech Audio + Transcript

Question

Custom Speech Audio + Transcript

dddd 61

I see that the audio snippet can not exceed 60 seconds.
I have uploaded data that is longer than 60 seconds and it seemed that the custom speech model was still trained.
What is happening with the rest of my data?
Is it being truncated?
Would it be better to stay under the 60 seconds?

thank you.

romungi-MSFT 49,101 Reputation points Microsoft Employee Moderator

2022-04-18T11:03:47.11+00:00

@dddd Did the below explanation help to clarify your question about training data >60s?

1 answer

Your answer

romungi-MSFT 49,101 Reputation points Microsoft Employee Moderator

2022-04-18T11:03:47.11+00:00

@dddd Did the below explanation help to clarify your question about training data >60s?

Answer 1

@dddd I have answered a same question earlier this year on the behavior of the trained model if the audio is more than 60 seconds. Please refer the complete thread and conversation here.

To summarize, using files with more than 60s would not compromise the training to the limits mentioned. The guidance is based on training the acoustic model which just needs upto 60s of data.
The audio files help in training the model based on your audio quality or background that you would probably use with all your future files. So, using length>60s will not ignore rest of the data. The text file transcripts used should be accurate to ensure the model is trained accurately for any length of your audio.

If an answer is helpful, please click on or upvote which might help other community members reading this thread.

Share via

Custom Speech Audio + Transcript

1 answer

Your answer