Text Normalization for Acronyms in Custom Speech

Stephen Clarke 1 Reputation point
2022-06-13T15:01:57.533+00:00

I am currently training a custom speech studio model using audio+transcript data. According to the text normalization for US English documentation, acronyms that are spoken letter by letter should be transcribed with a space in between each letter.

Example
Original text: "Water is H20"
Text after normalization: "Water is H 2 O"

When testing my current model however (this domain is very acronym heavy), I am getting situations where the acronym gets transcribed correctly, but there are no spaces in between each letter thus decreasing the WER.

Example:
Human-labeled transcription: "i d one"
Model 1: "id one"
Result: two deletions and one substitution
Expectation: zero errors

The main problem is the inconsistency, since some acronyms get transcribed with spaces, and others do not, so there is no easy way for me to ignore these "errors." Is there any way for me to help reduce the number of occurrences for this? I have also tried adding a pronunciation file with these common acronyms added, but there was no change in result.

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,854 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,040 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,741 Reputation points
    2022-06-14T16:21:33.903+00:00

    @Stephen Clarke Thanks for the question. Here is the document for preparing the training data to improve the recognition with the custom speech.
    https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/custom-speech-overview


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.