Indexing Media Files with Azure Media Indexer
Update your Azure Media Services REST API and SDKs to v3 by 29 February 2024. Version 3 of Azure Media Services REST API and client SDKs for .NET and Java offers more capabilities than version 2. We’re retiring version 2 of the Azure Media Services REST API and client SDKs for .NET and Java.
Action Required: To minimize disruption to your workloads, review the migration guide to transition your code from the version 2 API and SDKs to version 3 API and SDK before February 29th, 2024. After February 29th, 2024, Azure Media Services will no longer accept traffic on the version 2 REST API, the ARM account management API version 2015-10-01, or from the version 2 .NET client SDKs. This includes any 3rd party open-source client SDKS that may call the version 2 API. Learn about the latest version, starting with the Media Services v3 Overview.
You should migrate from Indexer v1 and Indexer v2 to using the Media Services v3 AudioAnalyzerPreset Basic mode. The Azure Media Indexer media processor and Azure Media Indexer 2 Preview media processors are being retired. For the retirement dates, see this legacy components topic.
Azure Media Indexer enables you to make content of your media files searchable and to generate a full-text transcript for closed captioning and keywords. You can process one media file or multiple media files in a batch.
When indexing content, make sure to use media files that have clear speech (without background music, noise, effects, or microphone hiss). Some examples of appropriate content are: recorded meetings, lectures, or presentations. The following content might not be suitable for indexing: movies, TV shows, anything with mixed audio and sound effects, poorly recorded content with background noise (hiss).
An indexing job can generate the following outputs:
Closed caption files in the following formats: TTML, and WebVTT.
Closed caption files include a tag called Recognizability, which scores an indexing job based on how recognizable the speech in the source video is. You can use the value of Recognizability to screen output files for usability. A low score would mean poor indexing results due to audio quality.
Keyword file (XML).
This article shows how to create indexing jobs to Index an asset and Index multiple files.
Using configuration and manifest files for indexing tasks
You can specify more details for your indexing tasks by using a task configuration. For example, you can specify which metadata to use for your media file. This metadata is used by the language engine to expand its vocabulary, and greatly improves the speech recognition accuracy. You are also able to specify your desired output files.
You can also process multiple media files at once by using a manifest file.
For more information, see Task Preset for Azure Media Indexer.
By default, an indexing job generates the following output files. The files are stored in the first output asset.
When there is more than one input media file, Indexer generates a manifest file for the job outputs, named ‘JobResult.txt’. For each input media file, the resulting TTML, WebVTT, and keyword files are sequentially numbered and named using the "Alias."
|Closed Caption (CC) files in TTML and WebVTT formats.
They can be used to make audio and video files accessible to people with hearing disability.
Closed Caption files include a tag called Recognizability which scores an indexing job based on how recognizable the speech in the source video is. You can use the value of Recognizability to screen output files for usability. A low score would mean poor indexing results due to audio quality.
|Keyword and info files.
Keyword file is an XML file that contains keywords extracted from the speech content, with frequency and offset information.
Info file is a plain-text file that contains granular information about each term recognized. The first line is special and contains the Recognizability score. Each subsequent line is a tab-separated list of the following data: start time, end time, word/phrase, confidence. The times are given in seconds and the confidence is given as a number from 0-1.
Example line: "1.20 1.45 word 0.67"
These files can be used for a number of purposes, such as, to perform speech analytics, or exposed to search engines such as Bing, Google or Microsoft SharePoint to make the media files more discoverable, or even used to deliver more relevant ads.
|JobResult.txt||Output manifest, present only when indexing multiple files, containing the following information:
If not all input media files are indexed successfully, the indexing job fails with error code 4000. For more information, see Error codes.
Partially Succeeded Job
If not all input media files are indexed successfully, the indexing job will fail with error code 4000. For more information, see Error codes.
The same outputs (as succeeded jobs) are generated. You can refer to the output manifest file to find out which input files are failed, according to the Error column values. For input files that failed, the resulting TTML, WebVTT, and keyword files will NOT be generated.
Task Preset for Azure Media Indexer
The processing from Azure Media Indexer can be customized by providing an optional task preset alongside the task. The following describes the format of this configuration xml.
|input||false||Asset file(s) that you want to index.
Azure Media Indexer supports the following media file formats: MP4, WMV, MP3, M4A, WMA, AAC, WAV.
You can specify the file name (s) in the name or list attribute of the input element (as shown below).If you do not specify which asset file to index, the primary file is picked. If no primary asset file is set, the first file in the input asset is indexed.
To explicitly specify the asset file name, do:
Note: If you add more than 10 files to the manifest file, the indexing job will fail with the 2006 error code.
|metadata||false||Metadata for the specified asset file(s) used for Vocabulary Adaptation. Useful to prepare Indexer to recognize non-standard vocabulary words such as proper nouns.
You can supply values for predefined keys. Currently the following keys are supported:
“title” and “description” - used for vocabulary adaptation to tweak the language model for your job and improve speech recognition accuracy. The values seed Internet searches to find contextually relevant text documents, using the contents to augment the internal dictionary for the duration of your Indexing task.
Added in version 1.2. Currently, the only supported feature is speech recognition ("ASR").
|false||The Speech Recognition feature has the following settings keys:
In the case of an error, Azure Media Indexer should report back one of the following error codes:
|2000||Invalid configuration||Invalid configuration|
|2001||Invalid input assets||Missing input assets or empty asset.|
|2002||Invalid manifest||Manifest is empty or manifest contains invalid items.|
|2003||Failed to download media file||Invalid URL in manifest file.|
|2004||Unsupported protocol||Protocol of media URL is not supported.|
|2005||Unsupported file type||Input media file type is not supported.|
|2006||Too many input files||There are more than 10 files in the input manifest.|
|3000||Failed to decode media file||Unsupported media codec
Corrupted media file
No audio stream in input media.
|4000||Batch indexing partially succeeded||Some of the input media files are failed to be indexed. For more information, see Output files.|
|other||Internal errors||Please contact support team. firstname.lastname@example.org|
Currently, the English and Spanish languages are supported.