What is the difference between Tags and Description["tags"] in the Computer Vision API?

Mike Ubezzi 2,771 Reputation points

"My team is using the Computer Vision service from the Microsoft Cognitive Services API. Within the JSON output from the images we are submitting, there are two sets of data, one with a key of 'Tags' and one with a key of 'Description[""tags""].

There appears to be some overlap between the data in this two sections but there are also unique tags in both, but I do not understand the difference an no one else on the team seems to understand it either.

Can anyone enlighten us? "

[Note: As we migrate from MSDN, this question has been posted by an Azure Cloud Engineer as a frequently asked question] Source: MSDN

Azure Computer Vision
Azure Computer Vision
An Azure artificial intelligence service that analyzes content in images and video.
255 questions
0 comments No comments
{count} votes

Accepted answer
  1. Rohit Mungi 801 Reputation points

    We have checked with our product team to understand the inner details of the two fields in the response. Here are more details to clarify the same.

    The Tags section of the response is based on a model that is different from the Description[Tags] model and they are using different threshold settings internally which provide different set of tags where some of them can be common and some of them only available in the Description[Tags] section as we have seen below.

      "description": {  
        "tags": ["outdoor", "road", "grass", "path", "trail", "forest", "tree", "side", "area", "narrow", "country", "track", "train", "street", "traveling", "dirt", "covered", "sign", "riding", "standing", "stop", "man", "red", "snow"],  
        "captions": [{  
          "text": "a path with trees on the side of a road",  
          "confidence": 0.965715635493424  
      "requestId": "<id>",  
      "metadata": {  
        "width": 800,  
        "height": 600,  
        "format": "Jpeg"  

    The threshold setting in our Tags setting is more optimized on precision while the captioning or Description[Tags] section is optimized on recall to encourage more words for captioning an image and sentence generation.

    If you want to understand more details about Precision and recall please check this documentation from custom vision which explains these scenarios.

    So, the above responses are basically available based on customer scenarios to help them use either precision or recall i.e to either use Tags for precise scenarios with higher thresholds or Description[Tags] for recall where sentence or text generation of an image is the primary objective.

    Source: Azure Documentation

0 additional answers

Sort by: Most helpful