@MlyamaeYuichi-6843 Based on feedback passed from the product group, Using larger files is not an issue because the process would still use these files to improve the custom terms in your data without ignoring anything above 60s. The shorter files help in training the acoustic part of the model.
To summarize, the text files or transcript play a bigger role in creating the model so ensuring the correct text is added is important.
The audio files complement the above by helping train the model based on your audio quality or background that you would probably use with all your future files. The length of the audio is preferably short for training the acoustic model.
I hope this helps.
If an answer is helpful, please click on
or upvote
which might help other community members reading this thread.