Get observed people detection and matched faces insights

Article
10/09/2024

Observed people detection, matched faces, detected clothing

Important

Face identification, customization and celebrity recognition features access is limited based on eligibility and usage criteria in order to support our Responsible AI principles. Face identification, customization and celebrity recognition features are only available to Microsoft managed customers and partners. Use the Face Recognition intake form to apply for access.

Observed people detection and matched faces automatically detect and match people in media files. Observed people detection and matched faces can be set to display insights on people, their clothing, and the exact timeframe of their appearance.

In the web portal, the resulting insights are displayed in a categorized list in the Insights tab, the tab includes a thumbnail of each person and their ID. Clicking the thumbnail of a person displays the matched person (the corresponding face in the People insight). Insights are also generated in a categorized list in a JSON file that includes the thumbnail ID of the person, the percentage of time appearing in the file, Wiki link (if they're a celebrity) and confidence level.

Observed people detection, detected clothing and matched faces use cases

Improving efficiency by deep searching for matched people in organizational archives for insight on specific celebrities, for example when creating promos and trailers.
Improved efficiency when creating feature stories, for example, searching for people wearing a red shirt in the archives of a football game at a News or Sports agency.
Create a summary out of a long video, like court evidence of a specific person’s appearance in a video, using the same detected person’s ID.
Learn and analyze trends over time, for example—how customers move across aisles in a shopping mall or how much time they spend in checkout lines.

The matched faces and detected clothing features are available when indexing your file by choosing the Advanced -> Video + audio indexing preset.

View the insight JSON with the web portal

Once you have uploaded and indexed a video, insights are available in JSON format for download using the web portal.

Select the Library tab.
Select media you want to work with.
Select Download and the Insights (JSON). The JSON file opens in a new browser tab.
Look for the key pair described in the example response.

Use the API

Use the Get Video Index request. We recommend passing &includeSummarizedInsights=false.
Look for the key pairs described in the example response.

Example response

"observedPeople": [
    {
        "id": 1,
        "thumbnailId": "d09ad62e-e0a4-42e5-8ca9-9a640c686596",
        "clothing": [
            {
                "id": 1,
                "type": "sleeve",
                "properties": {
                    "length": "short"
                }
            },
            {
                "id": 2,
                "type": "pants",
                "properties": {
                    "length": "short"
                }
            }
        ],
        "matchingFace": {
            "id": 1310,
            "confidence": 0.3819
        },
        "instances": [
            {
                "adjustedStart": "0:00:34.8681666",
                "adjustedEnd": "0:00:36.0026333",
                "start": "0:00:34.8681666",
                "end": "0:00:36.0026333"
            },
            {
                "adjustedStart": "0:00:36.6699666",
                "adjustedEnd": "0:00:36.7367",
                "start": "0:00:36.6699666",
                "end": "0:00:36.7367"
            },
            {
                "adjustedStart": "0:00:37.2038333",
                "adjustedEnd": "0:00:39.6729666",
                "start": "0:00:37.2038333",
                "end": "0:00:39.6729666"
            }
        ]
    }
]

Important

It is important to read the transparency note overview for all VI features. Each insight also has transparency notes of its own:

Observed people detection and matched faces notes

People are generally not detected if they appear small (minimum person height is 100 pixels).
Maximum frame size is full high definition (FHD).
Low quality video (for example, dark lighting conditions) might affect the detection results.
The recommended frame rate at least 30 FPS.
Recommended video input should contain up to 10 people in a single frame. The feature could work with more people in a single frame, but the detection result retrieves up to 10 people in a frame with the detection highest confidence.
People with similar clothes: (for example, people wear uniforms, players in sport games) could be detected as the same person with the same ID number.
Obstruction – there might be errors where there are obstructions (scene/self or obstructions by other people).
Pose: The tracks might be split due to different poses (back/front)
As clothing detection is dependent on the visibility of the person’s body, the accuracy is higher if a person is fully visible. There might be errors when a person is without clothing. In this scenario or others of poor visibility, results might be given such as long pants and skirt or dress.

Observed people detection and matched faces components

Component	Definition
Source file	The user uploads the source file for indexing.
Detection	The media file is tracked to detect observed people and their clothing. For example, shirt with long sleeves, dress or long pants. To be detected, the full upper body of the person must appear in the media.
Local grouping	The identified observed faces are filtered into local groups. If a person is detected more than once, more observed faces instances are created for this person.
Matching and classification	The observed people instances are matched to faces. If there's a known celebrity, the observed person is given their name. Any number of observed people instances can be matched to the same face.
Confidence value	The estimated confidence level of each observed person is calculated as a range of 0 to 1. The confidence score represents the certainty in the accuracy of the result. For example, an 82% certainty is represented as an 0.82 score.

Sample code

See all samples for VI

Share via