Get named entities extraction insights

Article
10/09/2024

Named entities extraction

Named entities extraction uses Natural Language Processing (NLP) to extract insights on the locations, people, and brands appearing in audio and images in media files. The named entities extraction insight uses transcription and optical character recognition (OCR).

Named entities use cases

Contextual advertising, for example, placing an ad for a Pizza chain following footage on Italy.
Deep searching media archives for insights on people or locations to create feature stories for the news.
Creating a verbal description of footage via OCR processing to enhance accessibility for the visually impaired, for example a background storyteller in movies.
Extracting insights on brand names.

View the insight JSON with the web portal

Once you have uploaded and indexed a video, insights are available in JSON format for download using the web portal.

Select the Library tab.
Select media you want to work with.
Select Download and the Insights (JSON). The JSON file opens in a new browser tab.
Look for the key pair described in the example response.

Use the API

Use the Get Video Index request. We recommend passing &includeSummarizedInsights=false.
Look for the key pairs described in the example response.

Example response

    namedPeople: [
    {
    referenceId: "Satya_Nadella",
    referenceUrl: "https://en.wikipedia.org/wiki/Satya_Nadella",
    confidence: 1,
    description: "CEO of Microsoft Corporation",
    seenDuration: 33.2,
    id: 2,
    name: "Satya Nadella",
    appearances: [
    {
    startTime: "0:01:11.04",
    endTime: "0:01:17.36",
    startSeconds: 71,
    endSeconds: 77.4
    },
    {
    startTime: "0:01:31.83",
    endTime: "0:01:37.1303666",
    startSeconds: 91.8,
    endSeconds: 97.1
    },

Important

It is important to read the transparency note overview for all VI features. Each insight also has transparency notes of its own:

Named entities notes

Carefully consider the accuracy of the results, to promote more accurate detections, check the quality of the audio and images, low quality audio and images might impact the detected insights.
Named entities only detect insights in audio and images. Logos in a brand name may not be detected.
Carefully consider that when using for law enforcement named entities may not always detect parts of the audio. To ensure fair and high-quality decisions, always combine named entities with human oversight.
Don't use named entities for decisions that may have serious adverse impacts on individuals and groups. Machine learning models that extract text can result in undetected or incorrect text output. Your decisions based on incorrect output could have serious adverse impacts that must be avoided. You should always include human review of determinations that have the potential for serious impacts on individuals.

Components

During the named entities extraction procedure, the media file is processed, as follows:

Component	Definition
Source file	The user uploads the source file for indexing.
Text extraction	- The audio file is sent to Speech Services API to extract the transcription. - Sampled frames are sent to the Azure AI Vision API to extract OCR.
Analytics	The insights are then sent to the Text Analytics API to extract the entities. For example, Microsoft, Paris or a person’s name like Paul or Sarah.
Processing and consolidation	The results are then processed. Where applicable, Wikipedia links are added and brands are identified via the Video Indexer built-in and customizable branding lists.
Confidence value	The estimated confidence level of each named entity is calculated as a range of 0 to 1. The confidence score represents the certainty in the accuracy of the result. For example, an 82% certainty is represented as an 0.82 score.

Sample code

See all samples for VI

Share via