Text Normalization for Acronyms in Custom Speech

Question

Text Normalization for Acronyms in Custom Speech

Stephen Clarke 1

I am currently training a custom speech studio model using audio+transcript data. According to the text normalization for US English documentation, acronyms that are spoken letter by letter should be transcribed with a space in between each letter.

Example
Original text: "Water is H20"
Text after normalization: "Water is H 2 O"

When testing my current model however (this domain is very acronym heavy), I am getting situations where the acronym gets transcribed correctly, but there are no spaces in between each letter thus decreasing the WER.

Example:
Human-labeled transcription: "i d one"
Model 1: "id one"
Result: two deletions and one substitution
Expectation: zero errors

The main problem is the inconsistency, since some acronyms get transcribed with spaces, and others do not, so there is no easy way for me to ignore these "errors." Is there any way for me to help reduce the number of occurrences for this? I have also tried adding a pronunciation file with these common acronyms added, but there was no change in result.

1 answer

Your answer

Answer 1

Ramr-msft 17,826

@Stephen Clarke Thanks for the question. Here is the document for preparing the training data to improve the recognition with the custom speech.
https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/custom-speech-overview

Stephen Clarke 1 Reputation point

2022-06-14T19:21:57.77+00:00

I am sorry but perhaps I didn't make this question clear. I was following the documentation you linked. The example I provided was from this link specifically: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-custom-speech-human-labeled-transcriptions

My data is in the format described in the link, but I am still getting results that are contradictory.

Share via

Text Normalization for Acronyms in Custom Speech

1 answer

Your answer