Does Azure support gather information from image?

Question

Hello experts, I am exploring for a feature which can provide rich content from an image, not only just one sentence. I know there are a lot of product supporting reading image and result a short description, but that is not what I want, I want much richer content.

Answer

Hello @sakai

Thanks for reaching out to us for this question. Azure Computer Vision just release new API 4.0 which include image caption feature, for your requirement of rich content, Dense Captions seems to be a good choice -

Image captions in Image Analysis 4.0 (preview) are available through the Caption and Dense Captions features.

Caption generates a one sentence description for all image contents. Dense Captions provides more detail by generating one sentence descriptions of up to 10 regions of the image in addition to describing the whole image. Dense Captions also returns bounding box coordinates of the described image regions. Both these features use the latest groundbreaking Florence based AI models.

At this time, image captioning is available in English language only.

The following JSON response illustrates what the Analysis 4.0 API returns when generating dense captions for the example image.

Photo of a tractor on a farm

{
  "denseCaptionsResult": {
    "values": [
      {
        "text": "a man driving a tractor in a farm",
        "confidence": 0.535620927810669,
        "boundingBox": {
          "x": 0,
          "y": 0,
          "w": 850,
          "h": 567
        }
      },
      {
        "text": "a man driving a tractor in a field",
        "confidence": 0.5428450107574463,
        "boundingBox": {
          "x": 132,
          "y": 266,
          "w": 209,
          "h": 219
        }
      },
      {
        "text": "a blurry image of a tree",
        "confidence": 0.5139822363853455,
        "boundingBox": {
          "x": 147,
          "y": 126,
          "w": 76,
          "h": 131
        }
      },
      {
        "text": "a man riding a tractor",
        "confidence": 0.4799223840236664,
        "boundingBox": {
          "x": 206,
          "y": 264,
          "w": 64,
          "h": 97
        }
      },
      {
        "text": "a blue sky above a hill",
        "confidence": 0.35495415329933167,
        "boundingBox": {
          "x": 0,
          "y": 0,
          "w": 837,
          "h": 166
        }
      },
      {
        "text": "a tractor in a field",
        "confidence": 0.47338250279426575,
        "boundingBox": {
          "x": 0,
          "y": 243,
          "w": 838,
          "h": 311
        }
      }
    ]
  },
  "modelVersion": "2023-02-01-preview",
  "metadata": {
    "width": 850,
    "height": 567
  }
}

If you feel it is what you are looking for, please refer to the document here - https://learn.microsoft.com/en-us/azure/cognitive-services/computer-vision/concept-describe-images-40?tabs=dense

Please be aware of that - Image captioning in Image Analysis 4.0 is only available in the following Azure data center regions at this time: East US, France Central, Korea Central, North Europe, Southeast Asia, West Europe, West US. You must use a Computer Vision resource located in one of these regions to get results from Caption and Dense Captions features.

I hope this helps!

Regards,

Yutong

-Please kindly accept the answer and vote 'Yes' if you feel helpful to support the community, thanks a lot.

Does Azure support gather information from image?

1 answer