Best Practices for Structuring Multi-Speaker Audio and Transcripts in Custom Azure Speech Models
Kay Wiberg
20
Reputation points
Hi!
My team and I have a few questions regarding Azure Speech, specifically about how to structure datasets for training and testing custom speech models.
Training Data
- Please confirm best format for audio and transcripts for multi-speaker training data for a custom speech model, including how the .zip file should be structured. Should there be only one line, without timestamps, in the .txt transcript file for each audio file, even if that audio file contains conversation turns from two different speakers? Could you please provide an illustrative example of such a transcript?
Test Data
- I know that training data transcripts for a custom speech model should all be normalized (lower case, no punctuation). For the test data transcripts - should it be true-cased and de-tokenized etc.?
Thank you!
Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,061 questions
Sign in to answer