Types of vision API services

Azure Cognitive Service for Vision is one of the broadest categories in Cognitive Services. You can use the APIs to incorporate vision features like image analysis, face detection, spatial analysis, and optical character recognition (OCR) in your applications, even if you have limited knowledge of machine learning.


Here are some broad categories of vision APIs:

  • Computer Vision provides advanced algorithms that process images and return information based on the visual features you're interested in. It provides four services: OCR, Face service, Image Analysis, and Spatial Analysis. Form Recognizer is an advanced version of OCR.
  • Custom Vision is an image recognition service that you can use to build, deploy, and improve your own image identifier models.
  • Face service provides AI algorithms that detect, recognize, and analyze human faces in images.

How to choose a service

The following flow chart can help you choose a vision service for your specific use case:

Diagram that shows how to choose a vision service.

Common use cases

  • Computer Vision

    • Describe an image. Analyze an image, evaluate the objects that are detected, and generate a human-readable phrase or sentence that describes the image.
    • Tag visual features. Apply tags that are based on a set of thousands of recognizable objects.
    • Categorize an image. Categorize images based on their content.
    • Implement OCR. Detect printed and handwritten text in images.
    • Detect image types. For example, identify clip art images or line drawings.
    • Detect color schemes. Identify the dominant foreground, background, and dominant and accent colors in an image.
    • Generate thumbnails. Create small versions of images.
    • Moderate content. Detect images that contain adult content or depict gory scenes.
    • Detect domain-specific content. Use two specialized domain models:
      • Celebrities. Identify thousands of well-known celebrities from sports, entertainment, and business domains.
      • Landmarks. Identify famous landmarks, like the Taj Mahal and the Statue of Liberty.
    • Detect objects. Identify common objects and return the coordinates of a bounding box.
    • Detect brands. Identify logos from an existing database of thousands of globally recognized product logos.
    • Detect faces. Detect and analyze human faces in an image. You can determine the age of the subject and return a bounding box that specifies the locations of faces. The facial analysis capabilities of the Computer Vision service are a subset of the ones provided by the dedicated Face service.
  • Custom Vision

    • Classify images. Predict a category, or class, based on a set of inputs, which are called features. Calculate a probability score for each possible class and return a label that indicates the class that the object most likely belongs to. To use this model, you need data that consists of features and their labels.
    • Detect objects. Get the coordinates of an object in an image. To use this model, you need data that consists of features and their labels.
  • Face services

    • Detect faces. Identify the regions of an image that contain a human face, typically by returning bounding-box coordinates that form a rectangle around the face.
    • Analyze faces. Return information, such as facial landmarks (nose, eyes, eyebrows, lips, and more). You can use these facial landmarks as features to train a machine learning model that can infer information about people, like their perceived age or emotional state.
    • Recognize faces. Train a machine learning model to identify known individuals from their facial features.


This article is maintained by Microsoft. It was originally written by the following contributors.

Principal authors:

Other contributors:

To see non-public LinkedIn profiles, sign in to LinkedIn.

Next steps