Output format of Speech SDK

Pooja Kamra 11 Reputation points
2022-02-21T18:31:59.413+00:00

Hi, we are using Speech to text functionality of Azure api and want to get the detailed output from the api.
As per the documentation, API will return data in four formats if detailed output format has been set.

  • Lexical - The lexical form of the recognized text: the actual words recognized.
  • ITN - The inverse-text-normalized (ITN) or canonical form of the recognized text, with phone numbers, numbers, abbreviations ("doctor smith" to "dr smith"), and other transformations applied.
  • MaskedITN - The ITN form with profanity masking applied, if requested.
  • Display - The display form of the recognized text, with punctuation and capitalization added

But in Display format we are also getting ITN text. e.g.
Parker and I got hit by a drunk driver going 62 miles an hour

Please help to understand the output form.
We want only capitalized text with punctuation mark. ITN data is not required.

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,673 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,736 Reputation points
    2022-02-22T17:01:36.573+00:00

    @Pooja Kamra Thanks for the question. Once you set the output format to “Detailed” see you can get the N-Best as a JSON.

    Here is an example for getting nBest.

    Example output from running this:

    We recognized: This is a test of the emergency broadcast system.
    Detailed JSON: {"Duration":32600000,"NBest":[
    {"Confidence":0.9719818830490112,"Display":"This is a test of the emergency broadcast system.","ITN":"this is a test of the emergency broadcast system","Lexical":"this is a test of the emergency broadcast system","MaskedITN":"This is a test of the emergency broadcast system"},
    {"Confidence":0.9169924259185791,"Display":"is this is a test of the emergency broadcast system","ITN":"is this is a test of the emergency broadcast system","Lexical":"is this is a test of the emergency broadcast system","MaskedITN":"is this is a test of the emergency broadcast system"},
    {"Confidence":0.9146955609321594,"Display":"this is a test of the emergency broadcast system uh","ITN":"this is a test of the emergency broadcast system uh","Lexical":"this is a test of the emergency broadcast system uh","MaskedITN":"this is a test of the emergency broadcast system uh"},
    {"Confidence":0.9065390229225159,"Display":"this is a test of the emergency broadcast systems","ITN":"this is a test of the emergency broadcast systems","Lexical":"this is a test of the emergency broadcast systems","MaskedITN":"this is a test of the emergency broadcast systems"},
    {"Confidence":0.9085215330123901,"Display":"this is a test of the emergency broadcast system 's","ITN":"this is a test of the emergency broadcast system 's","Lexical":"this is a test of the emergency broadcast system 's","MaskedITN":"this is a test of the emergency broadcast system 's"}]
    ,"Offset":2300000,"RecognitionStatus":"Success"}

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.