Text-based emotion detection insight overview

Warning

Over the past year, Azure AI Video Indexer (VI) announced the removal of its dependency on Azure Media Services (AMS) due to its retirement. Features adjustments and changes were announced and a migration guide was provided.

The deadline to complete migration was June 30, 2024. VI has extended the update/migrate deadline so you can update your VI account and opt in to the AMS VI asset migration through July 15th, 2024. To use the AMS VI asset migration, you also must extend your AMS account through July. Navigate to your AMS account in the Azure portal and select Click here to extend.

However, after June 30, if you have not updated your VI account, you won't be able to index new videos nor will you be able to play any videos that have not been migrated. If you update your account after June 30, you can resume indexing immediately but you won't be able to play videos indexed before the account update until they are migrated through the AMS VI migration.

Text-based emotion detection

Emotions detection detects emotions in video's transcript lines. Each sentence can either be detected as Anger, Fear, Joy, Sad, None if no other emotion was detected.

Important

The model works on text only (labeling emotions in video transcripts.) This model doesn't infer the emotional state of people, may not perform where input is ambiguous or unclear, like sarcastic remarks. Thus, the model shouldn't be used for things like assessing employee performance or the emotional state of a person.

Text-based emotion detection use cases

  • Content Creators and Video Editors - Content creators and video editors can use the system to analyze the emotions expressed in the text transcripts of their videos. The analysis helps them gain insights into the emotional tone of their content, allowing them to fine-tune the narrative, adjust pacing, or ensure the intended emotional impact on the audience.
  • Media Analysts and Researchers - Media analysts and researchers can employ the system to analyze the emotional content of a large volume of video transcripts quickly. They can use the emotional timeline generated by the system to identify trends, patterns, or emotional responses in specific topics or areas of interest.
  • Marketing and Advertising Professionals - Marketing and advertising professionals can utilize the system to assess the emotional reception of their campaigns or video advertisements. Understanding the emotions evoked by their content, helps them tailor messages more effectively and gauge the success of their campaigns.
  • Video Consumers and Viewers - End-users, such as viewers or consumers of video content, can benefit from the system by understanding the emotional context of videos without having to watch them entirely. It's useful for users who want to decide if a video is worth watching or for people with limited time to spare.
  • Entertainment Industry Professionals - Professionals in the entertainment industry, such as movie producers or directors, can utilize the system to gauge the emotional effect of their film scripts or storylines, aiding in script refinement and audience engagement.

Note

Text-based emotion detection is language independent, however if the transcript is not in English, it is first being translated to English and only then the model is applied. This may cause a reduced accuracy in emotions detection for non English languages.

View the insight JSON with the web portal

Once you have uploaded and indexed a video, insights are available in JSON format for download using the web portal.

  1. Select the Library tab.
  2. Select media you want to work with.
  3. Select Download and the Insights (JSON). The JSON file opens in a new browser tab.
  4. Look for the key pair described in the example response.

Use the API

  1. Use the Get Video Index request. We recommend passing &includeSummarizedInsights=false.
  2. Look for the key pair described in the example response.
"emotions": [ 
  { 
    "id": 1, 
    "type": "Sad", 
    "instances": [ 
      { 
        "confidence": 0.5518, 
        "adjustedStart": "0:00:00", 
        "adjustedEnd": "0:00:05.75", 
        "start": "0:00:00", 
        "end": "0:00:05.75" 
      }