Speech To Test - Labeled Testing Data For Model Training. Include or exclude numbers and dates

David Revell 1 Reputation point
2021-02-18T22:23:19.7+00:00

Recommendation needed from development. It appears that when uploading text transcriptions of the audio for training that there is a normalization process on the text. It appears that if the audio has decimals and dates in a row that the normalization fails. So the question is, would it be better to trim out all the decimal numbers and dates from the training data? It creates a lot of false normalization issues with (period vs point vs dot) Example: bullet points (1. This is a test. 2. This is another test. 3. This is once again not ok) decimals (This measurement is 2.23 x 3.55 x 1.24) and dates where it is 12/07/2009 (twelve oh seven two thousand nine vs twelve zero seven two oh oh nine vs twelve zero seven two zero zero nine).

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,476 questions
{count} votes