Labels identification

Article
06/07/2024

Warning

Over the past year, Azure AI Video Indexer (VI) announced the removal of its dependency on Azure Media Services (AMS) due to its retirement. Features adjustments and changes were announced and a migration guide was provided.

The deadline to complete migration was June 30, 2024. VI has extended the update/migrate deadline so you can update your VI account and opt in to the AMS VI asset migration through August 31, 2024.

However, after June 30, if you have not updated your VI account, you won't be able to index new videos nor will you be able to play any videos that have not been migrated. If you update your account after June 30, you can resume indexing immediately but you won't be able to play videos indexed before the account update until they are migrated through the AMS VI migration.

Labels identification is an Azure AI Video Indexer AI feature that identifies visual objects like sunglasses or actions like swimming, appearing in the video footage of a media file. There are many labels identification categories and once extracted, labels identification instances are displayed in the Insights tab and can be translated into over 50 languages. Clicking a Label opens the instance in the media file, select Play Previous or Play Next to see more instances.

Prerequisites

Review Transparency Note overview

View the insight

When working on the website the instances are displayed in the Insights tab. They can also be generated in a categorized list in a JSON file that includes the Labels ID, category, instances together with each label’s specific start and end times and confidence score, as follows:

To display labels identification insights in a JSON file, do the following:

Click Download and then Insights (JSON).

Copy the text, paste it into your JSON Viewer.

"labels": [
    {
    "id": 1,
    "name": "human face",
    "language": "en-US",
    "instances": [
        {
        "confidence": 0.9987,
        "adjustedStart": "0:00:00",
        "adjustedEnd": "0:00:25.6",
        "start": "0:00:00",
        "end": "0:00:25.6"
        },
        {
        "confidence": 0.9989,
        "adjustedStart": "0:01:21.067",
        "adjustedEnd": "0:01:41.334",
        "start": "0:01:21.067",
        "end": "0:01:41.334"
        }
    ]
    },
    {
    "id": 2,
    "name": "person",
    "referenceId": "person",
    "language": "en-US",
    "instances": [
        {
        "confidence": 0.9959,
        "adjustedStart": "0:00:00",
        "adjustedEnd": "0:00:26.667",
        "start": "0:00:00",
        "end": "0:00:26.667"
        },
        {
        "confidence": 0.9974,
        "adjustedStart": "0:01:21.067",
        "adjustedEnd": "0:01:41.334",
        "start": "0:01:21.067",
        "end": "0:01:41.334"
        }
    ]
    },

To download the JSON file via the API, Azure AI Video Indexer developer portal.

Labels components

During the Labels procedure, objects in a media file are processed, as follows:

Component	Definition
Source	The user uploads the source file for indexing.
Tagging	Images are tagged and labeled. For example, door, chair, woman, headphones, jeans.
Filtering and aggregation	Tags are filtered according to their confidence level and aggregated according to their category.
Confidence level	The estimated confidence level of each label is calculated as a range of 0 to 1. The confidence score represents the certainty in the accuracy of the result. For example, an 82% certainty is represented as an 0.82 score.

Example use cases

Extracting labels from frames for contextual advertising or branding. For example, placing an ad for beer following footage on a beach.
Creating a verbal description of footage to enhance accessibility for the visually impaired, for example a background storyteller in movies.
Deep searching media archives for insights on specific objects to create feature stories for the news.
Using relevant labels to create content for trailers, highlights reels, social media or new clips.

Considerations when choosing a use case

Carefully consider the accuracy of the results, to promote more accurate detections, check the quality of the video, low quality video might impact the detected insights.
Carefully consider when using for law enforcement that Labels potentially cannot detect parts of the video. To ensure fair and high-quality decisions, combine Labels with human oversight.
Don't use labels identification for decisions that may have serious adverse impacts. Machine learning models can result in undetected or incorrect classification output. Decisions based on incorrect output could have serious adverse impacts. Additionally, it's advisable to include human review of decisions that have the potential for serious impacts on individuals.

Share via