Find and redact (blur) faces with the Face Detector preset

Media Services logo v3


Azure Media Services will be retired June 30th, 2024. For more information, see the AMS Retirement Guide.


As Microsoft’s Responsible AI Standards outlines, Microsoft is committed to fairness, privacy, security, and transparency with respect to AI systems. To align with these standards, Azure Media Services is retiring the Video Analyzer preset on September 14, 2023. This preset currently allows you to extract multiple video and audio insights from a video file. Customers can replace their current workflows using the more advanced feature set offered by Azure Video Indexer.

Azure Media Services v3 API includes a Face Detector preset that offers scalable face detection and redaction (blurring) in the cloud. Face redaction enables you to modify your video in order to blur faces of selected individuals. You may want to use the face redaction service in public safety and news media scenarios. A few minutes of footage that contains multiple faces can take hours to redact manually, but with this preset the face redaction process will require just a few simple steps.

Compliance, privacy, and security

As an important reminder, you must comply with all applicable laws in your use of analytics in Azure Media Services. You must not use Azure Media Services or any other Azure service in a manner that violates the rights of others. Before uploading any videos, including any biometric data, to the Azure Media Services service for processing and storage, you must have all the proper rights, including all appropriate consents, from the individuals in the video. To learn about compliance, privacy and security in Azure Media Services, the Azure Cognitive Services Terms. For Microsoft’s privacy obligations and handling of your data, review Microsoft’s Privacy Statement, the Online Services Terms (OST) and Data Processing Addendum (“DPA”). More privacy information, including on data retention, deletion/destruction, is available in the OST. By using Azure Media Services, you agree to be bound by the Cognitive Services Terms, the OST, DPA, and the Privacy Statement

Face redaction modes

Facial redaction works by detecting faces in every frame of video and tracking the face object both forwards and backwards in time, so that the same individual can be blurred from other angles as well. The automated redaction process is complex and does not always blur every face 100% guaranteed. For this reason, the preset can be used a two-pass mode to improve the quality and accuracy of the blurring through an editing stage prior to submitting the file for the final blur pass.

In addition to a fully automatic Combined mode, the two-pass workflow allows you the ability to choose the faces you wish to blur (or not blur) via a list of face IDs. To make arbitrary per frame adjustments the preset uses a metadata file in JSON format as input to the second pass. This workflow is split into Analyze and Redact modes.

You can also easily just combine the two modes in a single pass that runs both tasks in one job; this mode is called Combined. In this article, the sample code will show how to use the simplified single pass Combined mode on a sample source file.

Combined mode

This produces a redacted MP4 video file in a single pass without any manual editing of the JSON file required. The output in the asset folder for the job will be a single .mp4 file that contains blurred faces using the selected blur effect. Use the resolution property set to SourceResolution to achieve the best results for redaction.

Stage File Name Notes
Input asset "ignite-sample.mp4" Video in WMV, MOV, or MP4 format
Preset config Face Detector configuration mode: FaceRedactorMode.Combined, blurType: BlurType.Med, resolution: AnalysisResolution.SourceResolution
Output asset "ignite-redacted.mp4 Video with blurring effect applied to faces

Analyze mode

The Analyze pass of the two-pass workflow takes a video input and produces a JSON file with a list of the face locations, Face ID's and jpg images of each detected face. Be advised that the face id's are not guaranteed to be identical on subsequent runs of the analysis pass.

Stage File Name Notes
Input asset "ignite-sample.mp4" Video in WMV, MPV, or MP4 format
Preset config Face Detector configuration mode: FaceRedactorMode.Analyze, resolution: AnalysisResolution.SourceResolution
Output asset ignite-sample_annotations.json Annotation data of face locations in JSON format. Face id's are not guaranteed to be identical on subsequent runs of the analysis pass. This can be edited by the user to modify the blurring bounding boxes. See sample below.
Output asset foo_thumb%06d.jpg [foo_thumb000001.jpg, foo_thumb000002.jpg] A cropped jpg of each detected face, where the number indicates the labelId of the face

Output example

  "version": 1,
  "timescale": 24000,
  "offset": 0,
  "framerate": 23.976,
  "width": 1280,
  "height": 720,
  "fragments": [
      "start": 0,
      "duration": 48048,
      "interval": 1001,
      "events": [
            "index": 13,
            "id": 1138,
            "x": 0.29537,
            "y": -0.18987,
            "width": 0.36239,
            "height": 0.80335
            "index": 13,
            "id": 2028,
            "x": 0.60427,
            "y": 0.16098,
            "width": 0.26958,
            "height": 0.57943

    ... truncated

Redact (blur) mode

The second pass of the workflow takes a larger number of inputs that must be combined into a single asset.

This includes a list of IDs to blur, the original video, and the annotations JSON. This mode uses the annotations to apply blurring on the input video.

The output from the Analyze pass does not include the original video. The video needs to be uploaded into the input asset for the Redact mode task and selected as the primary file.

Stage File Name Notes
Input asset "ignite-sample.mp4" Video in WMV, MPV, or MP4 format. Same video as in step 1.
Input asset "ignite-sample_annotations.json" Annotations metadata file from phase one, with optional modifications if you wish to change the faces blurred. This must be edited in an external application, code, or text editor.
Input asset "ignite-sample_IDList.txt" (Optional) Optional new line separated list of face IDs to redact. If left blank, all faces in the source will have blur applied. You can use the list to selectively choose not to blur specific faces.
Face Detector preset Preset configuration mode: FaceRedactorMode.Redact, blurType: BlurType.Med
Output asset "ignite-sample-redacted.mp4" Video with blurring applied based on annotations

Example output

This is the output from an IDList with one ID selected. The face id's are not guaranteed to be identical on subsequent runs of the analysis pass.

Example foo_IDList.txt


Blur types

In the Combined or Redact mode, there are five different blur modes you can choose from via the JSON input configuration: Low, Med, High, Box, and Black. By default Med is used.

You can find samples of the blur types below.


Low resolution blur setting example.


Medium resolution blur setting example.


High resolution blur setting example.


Box mode for use in debugging your output.


Black box mode covers all faces with black boxes.

Elements of the output JSON file

The Redaction MP provides high precision face location detection and tracking that can detect up to 64 human faces in a video frame. Frontal faces provide the best results, while side faces and small faces (less than or equal to 24x24 pixels) are challenging.

The job produces a JSON output file that contains metadata about detected and tracked faces. The metadata includes coordinates indicating the location of faces, as well as a face ID number indicating the tracking of that individual. Face ID numbers are prone to reset under circumstances when the frontal face is lost or overlapped in the frame, resulting in some individuals getting assigned multiple IDs.

The output JSON includes the following elements:

Root JSON elements

Element Description
version This refers to the version of the Video API.
timescale "Ticks" per second of the video.
offset This is the time offset for timestamps. In version 1.0 of Video APIs, this will always be 0. In future scenarios we support, this value may change.
width, hight The width and hight of the output video frame, in pixels.
framerate Frames per second of the video.

Fragments JSON elements

Element Description
start The start time of the first event in "ticks."
duration The length of the fragment, in “ticks.”
index (Applies to Azure Media Redactor only) defines the frame index of the current event.
interval The interval of each event entry within the fragment, in “ticks.”
events Each event contains the faces detected and tracked within that time duration. It is an array of events. The outer array represents one interval of time. The inner array consists of 0 or more events that happened at that point in time. An empty bracket [] means no faces were detected.
id The ID of the face that is being tracked. This number may inadvertently change if a face becomes undetected. A given individual should have the same ID throughout the overall video, but this cannot be guaranteed due to limitations in the detection algorithm (occlusion, etc.).
x, y The upper left X and Y coordinates of the face bounding box in a normalized scale of 0.0 to 1.0.
-X and Y coordinates are relative to landscape always, so if you have a portrait video (or upside-down, in the case of iOS), you'll have to transpose the coordinates accordingly.
width, height The width and height of the face bounding box in a normalized scale of 0.0 to 1.0.
facesDetected This is found at the end of the JSON results and summarizes the number of faces that the algorithm detected during the video. Because the IDs can be reset inadvertently if a face becomes undetected (e.g., the face goes off screen, looks away), this number may not always equal the true number of faces in the video.

Get help and support

You can contact Media Services with questions or follow our updates by one of the following methods: