Custom Speech: What happens if audio data for training exceeds 60 seconds?

Anonymous
2022-02-02T02:47:39.753+00:00

I am training custom speech models using audio + human-labeled transcript data.
According to the docs "each training file can't exceed 60 seconds, or it will error out", but I just have data with more than 60 seconds (about 5-10 minutes), and strangely I could upload it and train models with it. (Here's the link about the limitation, please scroll down until the "Note" section.)
So my question is, what happens if audio data for training exceeds 60 seconds? It look perfect on the console, but is something wrong happening inside the training loop? (for instance, the audio data was cut off at somewhere)

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
{count} votes

Answer accepted by question author
  1. romungi-MSFT 49,101 Reputation points Microsoft Employee Moderator
    2022-02-04T12:38:09.817+00:00

    @MlyamaeYuichi-6843 Based on feedback passed from the product group, Using larger files is not an issue because the process would still use these files to improve the custom terms in your data without ignoring anything above 60s. The shorter files help in training the acoustic part of the model.

    To summarize, the text files or transcript play a bigger role in creating the model so ensuring the correct text is added is important.
    The audio files complement the above by helping train the model based on your audio quality or background that you would probably use with all your future files. The length of the audio is preferably short for training the acoustic model.

    I hope this helps.

    If an answer is helpful, please click on 130616-image.png or upvote 130671-image.png which might help other community members reading this thread.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.