What is the difference between Tags and Description["tags"] in the Computer Vision API?

asked 2020-05-13T00:18:25.907+00:00
Mike Ubezzi 2,771 Reputation points

"My team is using the Computer Vision service from the Microsoft Cognitive Services API. Within the JSON output from the images we are submitting, there are two sets of data, one with a key of 'Tags' and one with a key of 'Description[""tags""].

There appears to be some overlap between the data in this two sections but there are also unique tags in both, but I do not understand the difference an no one else on the team seems to understand it either.

Can anyone enlighten us? "

[Note: As we migrate from MSDN, this question has been posted by an Azure Cloud Engineer as a frequently asked question] Source: MSDN

Azure Computer Vision
Azure Computer Vision
An Azure artificial intelligence service that analyzes content in images and video.
180 questions
No comments
{count} votes

Accepted answer
  1. answered 2020-05-13T10:07:29.26+00:00
    Rohit Mungi 801 Reputation points

    We have checked with our product team to understand the inner details of the two fields in the response. Here are more details to clarify the same.

    The Tags section of the response is based on a model that is different from the Description[Tags] model and they are using different threshold settings internally which provide different set of tags where some of them can be common and some of them only available in the Description[Tags] section as we have seen below.

    {  
      "description": {  
        "tags": ["outdoor", "road", "grass", "path", "trail", "forest", "tree", "side", "area", "narrow", "country", "track", "train", "street", "traveling", "dirt", "covered", "sign", "riding", "standing", "stop", "man", "red", "snow"],  
        "captions": [{  
          "text": "a path with trees on the side of a road",  
          "confidence": 0.965715635493424  
        }]  
      },  
      "requestId": "<id>",  
      "metadata": {  
        "width": 800,  
        "height": 600,  
        "format": "Jpeg"  
      }  
    }  
    
     
    

    The threshold setting in our Tags setting is more optimized on precision while the captioning or Description[Tags] section is optimized on recall to encourage more words for captioning an image and sentence generation.

    If you want to understand more details about Precision and recall please check this documentation from custom vision which explains these scenarios.

    So, the above responses are basically available based on customer scenarios to help them use either precision or recall i.e to either use Tags for precise scenarios with higher thresholds or Description[Tags] for recall where sentence or text generation of an image is the primary objective.

    Source: Azure Documentation

    No comments

0 additional answers

Sort by: Most helpful