Hi, the fix has been rolled out. Please check and let us know if you have any further questions or concerns. Thanks.
Adding prononciation to custom model currupts the recognition
I have two trained models using almost the same data (except pronunciation, which was not included for one of two) and the same base model 20201019.
The model which was trained with pronunciation is adding a lot of hex characters after recognized acronyms:
For example:
Model trained with pronunciation data (model ID: 4e87c4ec-cbfb-4763-b522-55c0c7a5593d ) returns:
- Sdapffbfedcbafcbedeccbcffeadbca, Pdcpbbadbadabdbdfabecaeaebaf, Rlcdecabeaadceffaecaabf
Note: acronym is recognized correctly and afterwards, the model, continues with a lot of hex characters. Where those characters come from? (there is no background noise in input audio)
Model trained without pronunciation data (model ID: a7291998-cd85-4d38-b8f7-9e8f7143d291 ) returns:
- SDAPPDCP RLC
Expected:
- SDAP, PDCP, RLC
Pronunciation file is encoded UTF-8-BOM.
looks like below (I've excluded other acronyms):
...
SDAP s d a p
PDCP p d c p
RLC r l c
...