@dddd I have answered a same question earlier this year on the behavior of the trained model if the audio is more than 60 seconds. Please refer the complete thread and conversation here.
To summarize, using files with more than 60s would not compromise the training to the limits mentioned. The guidance is based on training the acoustic model which just needs upto 60s of data.
The audio files help in training the model based on your audio quality or background that you would probably use with all your future files. So, using length>60s will not ignore rest of the data. The text file transcripts used should be accurate to ensure the model is trained accurately for any length of your audio.
If an answer is helpful, please click on
or upvote
which might help other community members reading this thread.