What is the difference between Tags and Description["tags"] in the Computer Vision API?

Question

What is the difference between Tags and Description["tags"] in the Computer Vision API?

Mike Ubezzi 2,776

"My team is using the Computer Vision service from the Microsoft Cognitive Services API. Within the JSON output from the images we are submitting, there are two sets of data, one with a key of 'Tags' and one with a key of 'Description[""tags""].

There appears to be some overlap between the data in this two sections but there are also unique tags in both, but I do not understand the difference an no one else on the team seems to understand it either.

Can anyone enlighten us? "

[Note: As we migrate from MSDN, this question has been posted by an Azure Cloud Engineer as a frequently asked question] Source: MSDN

Accepted answer

0 additional answers

Your answer

Answer 1

We have checked with our product team to understand the inner details of the two fields in the response. Here are more details to clarify the same.

The Tags section of the response is based on a model that is different from the Description[Tags] model and they are using different threshold settings internally which provide different set of tags where some of them can be common and some of them only available in the Description[Tags] section as we have seen below.

{  
  "description": {  
    "tags": ["outdoor", "road", "grass", "path", "trail", "forest", "tree", "side", "area", "narrow", "country", "track", "train", "street", "traveling", "dirt", "covered", "sign", "riding", "standing", "stop", "man", "red", "snow"],  
    "captions": [{  
      "text": "a path with trees on the side of a road",  
      "confidence": 0.965715635493424  
    }]  
  },  
  "requestId": "<id>",  
  "metadata": {  
    "width": 800,  
    "height": 600,  
    "format": "Jpeg"  
  }  
}

The threshold setting in our Tags setting is more optimized on precision while the captioning or Description[Tags] section is optimized on recall to encourage more words for captioning an image and sentence generation.

If you want to understand more details about Precision and recall please check this documentation from custom vision which explains these scenarios.

So, the above responses are basically available based on customer scenarios to help them use either precision or recall i.e to either use Tags for precise scenarios with higher thresholds or Description[Tags] for recall where sentence or text generation of an image is the primary objective.

Source: Azure Documentation

Michael Aycock 0 Reputation points

2023-05-24T19:58:24.9166667+00:00

What is a tag?
Michael Aycock 0 Reputation points

2023-05-24T20:00:05.3066667+00:00

Can I talk to someone?
Michael Aycock 0 Reputation points

2023-05-24T20:00:51.0933333+00:00

When and do I get an answer?

Share via

What is the difference between Tags and Description["tags"] in the Computer Vision API?

0 additional answers

Your answer