Text-based emotion detection insight overview

Text-based emotion detection

Emotions detection detects emotions in video's transcript lines. Each sentence can either be detected as Anger, Fear, Joy, Sad, None if no other emotion was detected.

Important

The model works on text only (labeling emotions in video transcripts.) This model doesn't infer the emotional state of people, may not perform where input is ambiguous or unclear, like sarcastic remarks. Thus, the model shouldn't be used for things like assessing employee performance or the emotional state of a person.

Text-based emotion detection use cases

  • Content Creators and Video Editors - Content creators and video editors can use the system to analyze the emotions expressed in the text transcripts of their videos. The analysis helps them gain insights into the emotional tone of their content, allowing them to fine-tune the narrative, adjust pacing, or ensure the intended emotional impact on the audience.
  • Media Analysts and Researchers - Media analysts and researchers can employ the system to analyze the emotional content of a large volume of video transcripts quickly. They can use the emotional timeline generated by the system to identify trends, patterns, or emotional responses in specific topics or areas of interest.
  • Marketing and Advertising Professionals - Marketing and advertising professionals can utilize the system to assess the emotional reception of their campaigns or video advertisements. Understanding the emotions evoked by their content, helps them tailor messages more effectively and gauge the success of their campaigns.
  • Video Consumers and Viewers - End-users, such as viewers or consumers of video content, can benefit from the system by understanding the emotional context of videos without having to watch them entirely. It's useful for users who want to decide if a video is worth watching or for people with limited time to spare.
  • Entertainment Industry Professionals - Professionals in the entertainment industry, such as movie producers or directors, can utilize the system to gauge the emotional effect of their film scripts or storylines, aiding in script refinement and audience engagement.

Note

Text-based emotion detection is language independent, however if the transcript is not in English, it is first being translated to English and only then the model is applied. This may cause a reduced accuracy in emotions detection for non English languages.

View the insight JSON with the web portal

Once you have uploaded and indexed a video, insights are available in JSON format for download using the web portal.

  1. Select the Library tab.
  2. Select media you want to work with.
  3. Select Download and the Insights (JSON). The JSON file opens in a new browser tab.
  4. Look for the key pair described in the example response.

Use the API

  1. Use the Get Video Index request. We recommend passing &includeSummarizedInsights=false.
  2. Look for the key pairs described in the example response.

Example response

"emotions": [ 
  { 
    "id": 1, 
    "type": "Sad", 
    "instances": [ 
      { 
        "confidence": 0.5518, 
        "adjustedStart": "0:00:00", 
        "adjustedEnd": "0:00:05.75", 
        "start": "0:00:00", 
        "end": "0:00:05.75" 
      }

Important

It is important to read the transparency note overview for all VI features. Each insight also has transparency notes of its own:

Text-based emotion detection notes

  • This model is designed to help detect emotions in the transcript of a video. However, it isn't suitable for making assessments about an individual's emotional state, their ability, or their overall performance.
  • This emotion detection model is intended to help determine the sentiment behind sentences in the video’s transcript. However, it only works on the text itself, and might not perform well for sarcastic input or in cases where input might be ambiguous or unclear.
  • To increase the accuracy of this model, it's recommended that input data be in a clear and unambiguous format. Users should also note that this model doesn't have context about input data, which can affect its accuracy.
  • This model can produce both false positives and false negatives. To reduce the likelihood of either, users are advised to follow best practices for input data and preprocessing, and to interpret outputs in the context of other relevant information. It's important to note that the system doesn't have any context of the input data.
  • The outputs of this model should NOT be used to make assessments about an individual's emotional state or other human characteristics. This model is supported in English and might not function properly with non-English inputs. Not English inputs are being translated to English before entering the model, therefore might produce less accurate results.
  • The model should never be used to evaluate employee performance or to monitor individuals.
  • The model should never be used for making assessments about a person, their emotional state, or their ability.
  • The results of the model can be inaccurate and should be treated with caution.
  • The confidence of the model in its prediction must also be taken into account.
  • Non-English videos produce less accurate results.

Text-based emotion detection components

During the emotions detection procedure, the transcript of the video is processed, as follows:

Component Definition
Source language The user uploads the source file for indexing.
Transcription API The audio file is sent to Azure AI services and the translated transcribed output is returned. A language is processed if it's specified.
Emotions detection Each sentence is sent to the emotions detection model. The model produces the confidence level of each emotion. If the confidence level exceeds a specific threshold, and there's no ambiguity between positive and negative emotions, the emotion is detected. In any other case, the sentence is labeled as neutral.
Confidence level The estimated confidence level of the detected emotions is calculated as a range of 0 to 1. The confidence score represents the certainty in the accuracy of the result. For example, an 82% certainty is represented as an 0.82 score.

Sample code

See all samples for VI