I checked my input data again ad it turns out my audio was too long. The different files were slightly longer than 40 seconds.
It was user error in the end but it would still be a good idea to implement more descriptive error messages. It could have saved me a few days of waiting.