What is Image Analysis?

The Computer Vision Image Analysis service can extract a wide variety of visual features from your images. For example, it can determine whether an image contains adult content, find specific brands or objects, or find human faces.

The latest version of Image Analysis, 4.0, which is now in public preview, has new features like synchronous OCR and people detection. We recommend you use this version going forward.

You can use Image Analysis through a client library SDK or by calling the REST API directly. Follow the quickstart to get started.

Or, you can try out the capabilities of Image Analysis quickly and easily in your browser using Vision Studio.

This documentation contains the following types of articles:

  • The quickstarts are step-by-step instructions that let you make calls to the service and get results in a short period of time.
  • The how-to guides contain instructions for using the service in more specific or customized ways.
  • The conceptual articles provide in-depth explanations of the service's functionality and features.
  • The tutorials are longer guides that show you how to use this service as a component in broader business solutions.

For a more structured approach, follow a Training module for Image Analysis.

Image Analysis features

You can analyze images to provide insights about their visual features and characteristics. All of the features in the list below are provided by the Analyze Image API. Follow a quickstart to get started.

Extract text from images (preview)

Version 4.0 preview of Image Analysis offers the ability to extract text from images. Compared with the async Computer Vision 3.2 GA Read, the new version offers the familiar Read OCR engine in a unified performance-enhanced synchronous API that makes it easy to get all image insights including OCR in a single API operation. Extract text from images

Detect people in images (preview)

Version 4.0 preview of Image Analysis offers the ability to detect people appearing in images. The bounding box coordinates of each detected person are returned, along with a confidence score. People detection

Tag visual features

Identify and tag visual features in an image, from a set of thousands of recognizable objects, living things, scenery, and actions. When the tags are ambiguous or not common knowledge, the API response provides hints to clarify the context of the tag. Tagging isn't limited to the main subject, such as a person in the foreground, but also includes the setting (indoor or outdoor), furniture, tools, plants, animals, accessories, gadgets, and so on. Tag visual features

An images of a skateboarder with tags listen on the right

Detect objects

Object detection is similar to tagging, but the API returns the bounding box coordinates for each tag applied. For example, if an image contains a dog, cat and person, the Detect operation will list those objects together with their coordinates in the image. You can use this functionality to process further relationships between the objects in an image. It also lets you know when there are multiple instances of the same tag in an image. Detect objects

An image of an office with a rectangle drawn around a laptop

Detect brands

Identify commercial brands in images or videos from a database of thousands of global logos. You can use this feature, for example, to discover which brands are most popular on social media or most prevalent in media product placement. Detect brands

Categorize an image

Identify and categorize an entire image, using a category taxonomy with parent/child hereditary hierarchies. Categories can be used alone, or with our new tagging models.
Currently, English is the only supported language for tagging and categorizing images. Categorize an image

Describe an image

Generate a description of an entire image in human-readable language, using complete sentences. Computer Vision's algorithms generate various descriptions based on the objects identified in the image. The descriptions are each evaluated and a confidence score generated. A list is then returned ordered from highest confidence score to lowest. Describe an image

An image of cows with a simple description on the right

Detect faces

Detect faces in an image and provide information about each detected face. Computer Vision returns the coordinates, rectangle, gender, and age for each detected face. Detect faces

You can also use the dedicated Face API for these purposes. It provides more detailed analysis, such as facial identification and pose detection.

Detect image types

Detect characteristics about an image, such as whether an image is a line drawing or the likelihood of whether an image is clip art. Detect image types

Detect domain-specific content

Use domain models to detect and identify domain-specific content in an image, such as celebrities and landmarks. For example, if an image contains people, Computer Vision can use a domain model for celebrities to determine if the people detected in the image are known celebrities. Detect domain-specific content

Detect the color scheme

Analyze color usage within an image. Computer Vision can determine whether an image is black & white or color and, for color images, identify the dominant and accent colors. Detect the color scheme

Get the area of interest / smart crop

Analyze the contents of an image to return the coordinates of the area of interest that matches a specified aspect ratio. Computer Vision returns the bounding box coordinates of the region, so the calling application can modify the original image as desired. Generate a thumbnail

An image of a person on a mountain, with cropped versions to the right

Moderate content in images

You can use Computer Vision to detect adult content in an image and return confidence scores for different classifications. The threshold for flagging content can be set on a sliding scale to accommodate your preferences.

Image requirements

Image Analysis works on images that meet the following requirements:

  • The image must be presented in JPEG, PNG, GIF, or BMP format
  • The file size of the image must be less than 4 megabytes (MB)
  • The dimensions of the image must be greater than 50 x 50 pixels and less than 16,000 x 16,000 pixels

Data privacy and security

As with all of the Cognitive Services, developers using the Computer Vision service should be aware of Microsoft's policies on customer data. See the Cognitive Services page on the Microsoft Trust Center to learn more.

Next steps

Get started with Image Analysis by following the quickstart guide in your preferred development language: