disable russian number denormalization

Question

python azure.cognitiveservices.speech as speechsdk

Hi! When recognizing Russian speech, words are automatically converted to numbers (instead of "он купил два стула" I get "он купил 2 стула"). I can't find any option in the documentation to turn this off.
thanks!

Accepted Answer

@Rostislav Kolobov The default setting for the format query parameter of the API is simple. Simple results include RecognitionStatus, DisplayText, Offset, and Duration. Detailed responses include four different representations of display text.

In the case of simple the recognized text after capitalization, punctuation, inverse text normalization, and profanity masking is shown in DisplayText. So, the result in your case will convert the words to shorter forms like 2. If the format parameter is set to detailed the result would include an object of NBest with the Lexical, ITN, MaskedITN and DisplayText where text with and without normalization is shown. The Lexical output in this case would provide you the actual words recognized and not its shorter forms.

So, to summarize you need to pass the format parameter as detailed and use the lexical output from the NBest object if required.

If an answer is helpful, please click on or upvote which might help other community members reading this thread.

Share via

disable russian number denormalization

python azure.cognitiveservices.speech as speechsdk

0 additional answers