Greetings!
The failure in training German audio files with human-made transcripts in Azure Custom Speech is due to the audio files being longer than the allowed 60-second duration. Azure Custom Speech requires that each audio file used for training purposes must be shorter than 60 seconds.
To fix this issue, you should modify your audio files so that they are all less than 60 seconds long. Once you have adjusted the length of your audio files, the training process should proceed without errors.
For further information on data requirements for training in Azure Custom Speech, you can visit the following link: Audio + Human-Labeled Transcript Data for Training or Testing.
Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.
Please do not forget to "up-vote" wherever the information provided helps you, as this can be beneficial to other community members.