@Stephen Clarke Thanks for the question. Here is the document for preparing the training data to improve the recognition with the custom speech.
https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/custom-speech-overview
Text Normalization for Acronyms in Custom Speech
I am currently training a custom speech studio model using audio+transcript data. According to the text normalization for US English documentation, acronyms that are spoken letter by letter should be transcribed with a space in between each letter.
Example
Original text: "Water is H20"
Text after normalization: "Water is H 2 O"
When testing my current model however (this domain is very acronym heavy), I am getting situations where the acronym gets transcribed correctly, but there are no spaces in between each letter thus decreasing the WER.
Example:
Human-labeled transcription: "i d one"
Model 1: "id one"
Result: two deletions and one substitution
Expectation: zero errors
The main problem is the inconsistency, since some acronyms get transcribed with spaces, and others do not, so there is no easy way for me to ignore these "errors." Is there any way for me to help reduce the number of occurrences for this? I have also tried adding a pronunciation file with these common acronyms added, but there was no change in result.
1 answer
Sort by: Most helpful
-
Ramr-msft 17,741 Reputation points
2022-06-14T16:21:33.903+00:00