Adding prononciation to custom model currupts the recognition

Andrei 1 Reputation point
2021-05-14T08:12:40.227+00:00

I have two trained models using almost the same data (except pronunciation, which was not included for one of two) and the same base model 20201019.

The model which was trained with pronunciation is adding a lot of hex characters after recognized acronyms:

For example:
Model trained with pronunciation data (model ID: 4e87c4ec-cbfb-4763-b522-55c0c7a5593d ) returns:

  • Sdapffbfedcbafcbedeccbcffeadbca, Pdcpbbadbadabdbdfabecaeaebaf, Rlcdecabeaadceffaecaabf
    Note: acronym is recognized correctly and afterwards, the model, continues with a lot of hex characters. Where those characters come from? (there is no background noise in input audio)

Model trained without pronunciation data (model ID: a7291998-cd85-4d38-b8f7-9e8f7143d291 ) returns:

  • SDAPPDCP RLC

Expected:

  • SDAP, PDCP, RLC

Pronunciation file is encoded UTF-8-BOM.
looks like below (I've excluded other acronyms):
...
SDAP s d a p
PDCP p d c p
RLC r l c
...

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,677 questions
{count} votes

1 answer

Sort by: Most helpful
  1. GiftA-MSFT 11,166 Reputation points
    2021-06-18T22:37:46.303+00:00

    Hi, the fix has been rolled out. Please check and let us know if you have any further questions or concerns. Thanks.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.