What is Video Retrieval?

2025-06-09

Caution

On 30 June 2025, Azure AI Vision Video Retrieval was retired. The decision to retire this feature was part of our ongoing effort to improve and simplify and improve the features offered for video processing. Migrate to Azure AI Content Understanding and Azure AI Search to benefit from their additional capabilities.

Video processing: Video Retrieval vs Azure AI Content Understanding

Feature	Video Retrieval for video description	Azure AI Content Understanding
Video Length Supported	Optimized for short videos, up to ~3 minutes	Supports short & long videos, up to 4 hours
Frame Processing	Up to 20 frames	Batch processing, sampling shot-by-shot sampled across entire video
Content Extraction Pre-Processing	Transcription	Transcription, Shot identification, Face grouping
Structured Output Support	Not supported	Supports schema-conforming structured outputs
Data types	Video supported	Video, images, documents, and speech supported
Pricing	Variable Token-based	Fixed cost per minute of video processed

To migrate to Content Understanding for video summaries and descriptions, we'd recommend reviewing the Azure AI Content Understanding documentation.

Video Search: Video Retrieval vs. Azure AI Search and Content Understanding

Feature	Video Retrieval for video search	Azure AI Search and Content Understanding
Visual Embedding type	Frame-based Image Embeddings	Video description text embeddings
Content Extraction Pre-Processing	Transcription, OCR	Transcription, Shot identification, Face grouping
People & Object search support	Strong support	Strong support
Action and Event support	Limited	Strong support
Customization	None	Content Understanding analyzer can be customized to focus using the fields and field descriptions

To start building the search use case with Content Understanding, we recommend starting with this sample which shows how to use Azure AI Search to search videos.

Video Retrieval is a service that lets you create a search index, add documents (videos and images) to it, and search with natural language. Developers can define metadata schemas for each index and ingest metadata to the service to help with retrieval. Developers can also specify what features to extract from the index (vision, speech) and filter their search based on features.

Call the Video Retrieval APIs

Input requirements

Supported formats

File format	Description
`asf`	ASF (Advanced / Active Streaming Format)
`avi`	AVI (Audio Video Interleaved)
`flv`	FLV (Flash Video)
`matroskamm`, `webm`	Matroska / WebM
`mov`,`mp4`,`m4a`,`3gp`,`3g2`,`mj2`	QuickTime / MOV

Supported video codecs

Codec	Format
`h264`	H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10
`h265`	H.265/HEVC
`libvpx-vp9`	libvpx VP9 (codec vp9)
`mpeg4`	MPEG-4 part 2

Supported audio codecs

Codec	Format
`aac`	AAC (Advanced Audio Coding)
`mp3`	MP3 (MPEG audio layer 3)
`pcm`	PCM (uncompressed)
`vorbis`	Vorbis
`wmav2`	Windows Media Audio 2

Share via

What is Video Retrieval?

Input requirements

Supported formats

Supported video codecs

Supported audio codecs

Feedback

Additional resources