View and update transcriptions


Due to the Azure Media Services retirement announcement, Azure AI Video Indexer announces Azure AI Video Indexer features adjustments. See Changes related to Azure Media Service (AMS) retirement to understand what this means for your Azure AI Video Indexer account. See the Preparing for AMS retirement: VI update and migration guide.

This article explains how to insert or remove a transcript line in the Azure AI Video Indexer website. It also shows how to view word-level information.

Insert or remove transcript lines in the Azure AI Video Indexer website

This section explains how to insert or remove a transcript line in the Azure AI Video Indexer website.

Add new line to the transcript timeline

While in the edit mode, hover between two transcription lines. You'll find a gap between ending time of the transcript line and the beginning of the following transcript line, user should see the following add new transcription line option.

Screenshot of how to add new transcription.

After clicking the add new transcription line, there will be an option to add the new text and the time stamp for the new line. Enter the text, choose the time stamp for the new line, and select save. The default time stamp is the gap between the previous and next transcript line.

Screenshot of a new transcript time stamp line.

If there isn’t an option to add a new line, you can adjust the end/start time of the relevant transcript lines to fit a new line in your desired place.

Choose an existing line in the transcript line, click the three dots icon, select edit and change the time stamp accordingly.


New lines will not appear as part of the From transcript edits in the Content model customization under languages.

While using the API, when adding a new line, Speaker name can be added using free text. For example, Speaker 1 can now become Adam.

Edit existing line

While in the edit mode, select the three dots icon. The editing options were enhanced, they now contain not just the text but also the time stamp with accuracy of milliseconds.

Delete line

Lines can now be deleted through the same three dots icon.

Consolidate two lines as one

To consolidate two lines, which you believe should appear as one.

  1. Go to line number 2, select edit.
  2. Copy the text
  3. Delete the line
  4. Go to line 1, edit, paste the text and save.

Examine word-level transcription information

This section shows how to examine word-level transcription information based on sentences and phrases that Azure AI Video Indexer identified. Each phrase is broken into words and each word has the following information associated with it

Name Description Example
Word A word from a phrase. "thanks"
Confidence How confident the Azure AI Video Indexer that the word is correct. 0.80127704
Offset The time offset from the beginning of the video to where the word starts. PT0.86S
Duration The duration of the word. PT0.28S

Get and view the transcript

  1. Sign in on the Azure AI Video Indexer website.
  2. Select a video.
  3. In the top-right corner, press arrow down and select Artifacts (ZIP).
  4. Download the artifacts.
  5. Unzip the downloaded file > browse to where the unzipped files are located > find and open transcript.speechservices.json.
  6. Format and view the json.
  7. FindRecognizedPhrases > NBest > Words and find interesting to you information.
"RecognizedPhrases": [
  "RecognitionStatus": "Success",
  "Channel": 0,
  "Speaker": 1,
  "Offset": "PT0.86S",
  "Duration": "PT11.01S",
  "OffsetInTicks": 8600000,
  "DurationInTicks": 110100000,
  "NBest": [
      "Confidence": 0.82356554,
      "Lexical": "thanks for joining ...",
      "ITN": "thanks for joining ...",
      "MaskedITN": "",
      "Display": "Thanks for joining ...",
      "Words": [
          "Word": "thanks",
          "Confidence": 0.80127704,
          "Offset": "PT0.86S",
          "Duration": "PT0.28S",
          "OffsetInTicks": 8600000,
          "DurationInTicks": 2800000
          "Word": "for",
          "Confidence": 0.93965703,
          "Offset": "PT1.15S",
          "Duration": "PT0.13S",
          "OffsetInTicks": 11500000,
          "DurationInTicks": 1300000
          "Word": "joining",
          "Confidence": 0.97060966,
          "Offset": "PT1.29S",
          "Duration": "PT0.31S",
          "OffsetInTicks": 12900000,
          "DurationInTicks": 3100000