Indexing Media Files with Azure Media Indexer


You should migrate from Indexer v1 and Indexer v2 to using the Media Services v3 AudioAnalyzerPreset Basic mode. The Azure Media Indexer media processor and Azure Media Indexer 2 Preview media processors are being retired. For the retirement dates, see this legacy components topic.

Azure Media Indexer enables you to make content of your media files searchable and to generate a full-text transcript for closed captioning and keywords. You can process one media file or multiple media files in a batch.

When indexing content, make sure to use media files that have clear speech (without background music, noise, effects, or microphone hiss). Some examples of appropriate content are: recorded meetings, lectures, or presentations. The following content might not be suitable for indexing: movies, TV shows, anything with mixed audio and sound effects, poorly recorded content with background noise (hiss).

An indexing job can generate the following outputs:

  • Closed caption files in the following formats: TTML, and WebVTT.

    Closed caption files include a tag called Recognizability, which scores an indexing job based on how recognizable the speech in the source video is. You can use the value of Recognizability to screen output files for usability. A low score would mean poor indexing results due to audio quality.

  • Keyword file (XML).

This article shows how to create indexing jobs to Index an asset and Index multiple files.

Using configuration and manifest files for indexing tasks

You can specify more details for your indexing tasks by using a task configuration. For example, you can specify which metadata to use for your media file. This metadata is used by the language engine to expand its vocabulary, and greatly improves the speech recognition accuracy. You are also able to specify your desired output files.

You can also process multiple media files at once by using a manifest file.

For more information, see Task Preset for Azure Media Indexer.

Output files

By default, an indexing job generates the following output files. The files are stored in the first output asset.

When there is more than one input media file, Indexer generates a manifest file for the job outputs, named ‘JobResult.txt’. For each input media file, the resulting TTML, WebVTT, and keyword files are sequentially numbered and named using the "Alias."

File name Description
Closed Caption (CC) files in TTML and WebVTT formats.

They can be used to make audio and video files accessible to people with hearing disability.

Closed Caption files include a tag called Recognizability which scores an indexing job based on how recognizable the speech in the source video is. You can use the value of Recognizability to screen output files for usability. A low score would mean poor indexing results due to audio quality.
Keyword and info files.

Keyword file is an XML file that contains keywords extracted from the speech content, with frequency and offset information.

Info file is a plain-text file that contains granular information about each term recognized. The first line is special and contains the Recognizability score. Each subsequent line is a tab-separated list of the following data: start time, end time, word/phrase, confidence. The times are given in seconds and the confidence is given as a number from 0-1.

Example line: "1.20 1.45 word 0.67"

These files can be used for a number of purposes, such as, to perform speech analytics, or exposed to search engines such as Bing, Google or Microsoft SharePoint to make the media files more discoverable, or even used to deliver more relevant ads.
JobResult.txt Output manifest, present only when indexing multiple files, containing the following information:


If not all input media files are indexed successfully, the indexing job fails with error code 4000. For more information, see Error codes.

Partially Succeeded Job

If not all input media files are indexed successfully, the indexing job will fail with error code 4000. For more information, see Error codes.

The same outputs (as succeeded jobs) are generated. You can refer to the output manifest file to find out which input files are failed, according to the Error column values. For input files that failed, the resulting TTML, WebVTT, and keyword files will NOT be generated.

Task Preset for Azure Media Indexer

The processing from Azure Media Indexer can be customized by providing an optional task preset alongside the task. The following describes the format of this configuration xml.

Name Require Description
input false Asset file(s) that you want to index.

Azure Media Indexer supports the following media file formats: MP4, WMV, MP3, M4A, WMA, AAC, WAV.

You can specify the file name (s) in the name or list attribute of the input element (as shown below).If you do not specify which asset file to index, the primary file is picked. If no primary asset file is set, the first file in the input asset is indexed.

To explicitly specify the asset file name, do:
<input name="TestFile.wmv">

You can also index multiple asset files at once (up to 10 files). To do this:

  1. Create a text file (manifest file) and give it an .lst extension.

  2. Add a list of all the asset file names in your input asset to this manifest file.

  3. Add (upload) the manifest file to the asset.

  4. Specify the name of the manifest file in the input’s list attribute.
    <input list="input.lst">

Note: If you add more than 10 files to the manifest file, the indexing job will fail with the 2006 error code.
metadata false Metadata for the specified asset file(s) used for Vocabulary Adaptation. Useful to prepare Indexer to recognize non-standard vocabulary words such as proper nouns.
<metadata key="..." value="..."/>

You can supply values for predefined keys. Currently the following keys are supported:

“title” and “description” - used for vocabulary adaptation to tweak the language model for your job and improve speech recognition accuracy. The values seed Internet searches to find contextually relevant text documents, using the contents to augment the internal dictionary for the duration of your Indexing task.
<metadata key="title" value="[Title of the media file]" />
<metadata key="description" value="[Description of the media file] />"

Added in version 1.2. Currently, the only supported feature is speech recognition ("ASR").
false The Speech Recognition feature has the following settings keys:



Example value


The natural language to be recognized in the multimedia file.

English, Spanish


a semicolon-separated list of the desired output caption formats (if any)


True; False


A boolean flag specifying whether or not a keyword XML file is required.

True; False.


A boolean flag specifying whether or not to force full captions (regardless of confidence level).

Default is false, in which case words and phrases which have a less than 50% confidence level are omitted from the final caption outputs and replaced by ellipses ("..."). The ellipses are useful for caption quality control and auditing.

True; False.

Error codes

In the case of an error, Azure Media Indexer should report back one of the following error codes:

Code Name Possible Reasons
2000 Invalid configuration Invalid configuration
2001 Invalid input assets Missing input assets or empty asset.
2002 Invalid manifest Manifest is empty or manifest contains invalid items.
2003 Failed to download media file Invalid URL in manifest file.
2004 Unsupported protocol Protocol of media URL is not supported.
2005 Unsupported file type Input media file type is not supported.
2006 Too many input files There are more than 10 files in the input manifest.
3000 Failed to decode media file Unsupported media codec
Corrupted media file
No audio stream in input media.
4000 Batch indexing partially succeeded Some of the input media files are failed to be indexed. For more information, see Output files.
other Internal errors Please contact support team.

Supported Languages

Currently, the English and Spanish languages are supported.