disable russian number denormalization

Rostislav Kolobov 21 Reputation points
2022-04-19T05:33:42.937+00:00

python azure.cognitiveservices.speech as speechsdk

Hi! When recognizing Russian speech, words are automatically converted to numbers (instead of "он купил два стула" I get "он купил 2 стула"). I can't find any option in the documentation to turn this off.
thanks!

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,555 questions
0 comments No comments
{count} votes

Accepted answer
  1. romungi-MSFT 43,696 Reputation points Microsoft Employee
    2022-04-19T10:15:32.773+00:00

    @Rostislav Kolobov The default setting for the format query parameter of the API is simple. Simple results include RecognitionStatus, DisplayText, Offset, and Duration. Detailed responses include four different representations of display text.

    In the case of simple the recognized text after capitalization, punctuation, inverse text normalization, and profanity masking is shown in DisplayText. So, the result in your case will convert the words to shorter forms like 2. If the format parameter is set to detailed the result would include an object of NBest with the Lexical, ITN, MaskedITN and DisplayText where text with and without normalization is shown. The Lexical output in this case would provide you the actual words recognized and not its shorter forms.

    So, to summarize you need to pass the format parameter as detailed and use the lexical output from the NBest object if required.

    If an answer is helpful, please click on 130616-image.png or upvote 130671-image.png which might help other community members reading this thread.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful