Create a custom Image Analysis model (preview)

Image Analysis 4.0 allows you to train a custom model using your own training images. By manually labeling your images, you can train a model to apply custom tags to the images (image classification) or detect custom objects (object detection). Image Analysis 4.0 models are especially effective at few-shot learning, so you can get accurate models with less training data.

This guide shows you how to create and train a custom image classification model. The few differences between training an image classification model and object detection model are noted.

Prerequisites

  • Azure subscription - Create one for free
  • Once you have your Azure subscription, create a Vision resource in the Azure portal to get your key and endpoint. If you're following this guide using Vision Studio, you must create your resource in the East US region. If you're using the Python library, you can create it in the East US, West US 2, or West Europe region. After it deploys, select Go to resource. Copy the key and endpoint to a temporary location to use later on.
  • An Azure Storage resource - Create one
  • A set of images with which to train your classification model. You can use the set of sample images on GitHub. Or, you can use your own images. You only need about 3-5 images per class.

Note

We do not recommend you use custom models for business critical environments due to potential high latency. When customers train custom models in Vision Studio, those custom models belong to the Vision resource that they were trained under and the customer is able to make calls to those models using the Analyze Image API. When they make these calls, the custom model is loaded in memory and the prediction infrastructure is initialized. While this happens, customers might experience longer than expected latency to receive prediction results.

Train your own image classifier (IC) or object detector (OD) with your own data using Image Analysis model customization and Python.

You can run through all of the model customization steps using a Python sample package. You can run the code in this section using a Python script, or you can download and run the Notebook on a compatible platform.

Tip

Contents of cognitive_service_vision_model_customization.ipynb. Open in GitHub.

Install the python samples package

Install the sample code to train/predict custom models with Python:

pip install cognitive-service-vision-model-customization-python-samples

Authentication

Enter your Azure AI Vision endpoint URL, key, and the name of the resource, into the code below.

# Resource and key
import logging
logging.getLogger().setLevel(logging.INFO)
from cognitive_service_vision_model_customization_python_samples import ResourceType

resource_type = ResourceType.SINGLE_SERVICE_RESOURCE # or ResourceType.MULTI_SERVICE_RESOURCE

resource_name = None
multi_service_endpoint = None

if resource_type == ResourceType.SINGLE_SERVICE_RESOURCE:
    resource_name = '{specify_your_resource_name}'
    assert resource_name
else:
    multi_service_endpoint = '{specify_your_service_endpoint}'
    assert multi_service_endpoint

resource_key = '{specify_your_resource_key}'

Prepare a dataset from Azure blob storage

To train a model with your own dataset, the dataset should be arranged in the COCO format described below, hosted on Azure blob storage, and accessible from your Vision resource.

Dataset annotation format

Image Analysis uses the COCO file format for indexing/organizing the training images and their annotations. Below are examples and explanations of what specific format is needed for multiclass classification and object detection.

Image Analysis model customization for classification is different from other kinds of vision training, as we utilize your class names, as well as image data, in training. So, be sure provide meaningful category names in the annotations.

Note

In the example dataset, there are few images for the sake of simplicity. Although Florence models achieve great few-shot performance (high model quality even with little data available), it's good to have more data for the model to learn. Our recommendation is to have at least five images per class, and the more the better.

Once your COCO annotation file is prepared, you can use the COCO file verification script to check the format.

Multiclass classification example

{
  "images": [{"id": 1, "width": 224.0, "height": 224.0, "file_name": "images/siberian-kitten.jpg", "absolute_url": "https://{your_blob}.blob.core.windows.net/datasets/cat_dog/images/siberian-kitten.jpg"},
              {"id": 2, "width": 224.0, "height": 224.0, "file_name": "images/kitten-3.jpg", "absolute_url": "https://{your_blob}.blob.core.windows.net/datasets/cat_dog/images/kitten-3.jpg"}],
  "annotations": [
      {"id": 1, "category_id": 1, "image_id": 1},
      {"id": 2, "category_id": 1, "image_id": 2},
  ],
  "categories": [{"id": 1, "name": "cat"}, {"id": 2, "name": "dog"}]
}

Besides absolute_url, you can also use coco_url (the system accepts either field name).

Object detection example

{
  "images": [{"id": 1, "width": 224.0, "height": 224.0, "file_name": "images/siberian-kitten.jpg", "absolute_url": "https://{your_blob}.blob.core.windows.net/datasets/cat_dog/images/siberian-kitten.jpg"},
              {"id": 2, "width": 224.0, "height": 224.0, "file_name": "images/kitten-3.jpg", "absolute_url": "https://{your_blob}.blob.core.windows.net/datasets/cat_dog/images/kitten-3.jpg"}],
  "annotations": [
      {"id": 1, "category_id": 1, "image_id": 1, "bbox": [0.1, 0.1, 0.3, 0.3]},
      {"id": 2, "category_id": 1, "image_id": 2, "bbox": [0.3, 0.3, 0.6, 0.6]},
      {"id": 3, "category_id": 2, "image_id": 2, "bbox": [0.2, 0.2, 0.7, 0.7]}
  ],
  "categories": [{"id": 1, "name": "cat"}, {"id": 2, "name": "dog"}]
}

The values in bbox: [left, top, width, height] are relative to the image width and height.

Blob storage directory structure

Following the examples above, the data directory in your Azure Blob Container https://{your_blob}.blob.core.windows.net/datasets/ should be arranged like below, where train_coco.json is the annotation file.

cat_dog/
    images/
        1.jpg
        2.jpg
    train_coco.json

Tip

Quota limit information, including the maximum number of images and categories supported, maximum image size, and so on, can be found on the concept page.

Grant Azure AI Vision access to your Azure data blob

You need to take an extra step to give your Vision resource access to read the contents of your Azure blog storage container. There are two ways to do this.

Option 1: Shared access signature (SAS)

You can generate a SAS token with at least read permission on your Azure Blob Container. This is the option used in the code below. For instructions on acquiring a SAS token, see Create SAS tokens.

Option 2: Managed Identity or public accessible

You can also use Managed Identity to grant access.

Below is a series of steps for allowing the system-assigned Managed Identity of your Vision resource to access your blob storage. In the Azure portal:

  1. Go to the Identity / System assigned tab of your Vision resource, and change the Status to On.
  2. Go to the Access Control (IAM) / Role assignment tab of your blob storage resource, select Add / Add role assignment, and choose either Storage Blob Data Contributor or Storage Blob Data Reader.
  3. Select Next, and choose Managed Identity under Assign access to, and then select Select members.
  4. Choose your subscription, with the Managed Identity being Azure AI Vision, and look up the one that matches your Vision resource name.

Register the dataset

Once your dataset has been prepared and hosted on your Azure blob storage container, with access granted to your Vision resource, you can register it with the service.

Note

The service only accesses your storage data during training. It doesn't keep copies of your data beyond the training cycle.

from cognitive_service_vision_model_customization_python_samples import DatasetClient, Dataset, AnnotationKind, AuthenticationKind, Authentication

dataset_name = '{specify_your_dataset_name}'
auth_kind = AuthenticationKind.SAS # or AuthenticationKind.MI

dataset_client = DatasetClient(resource_type, resource_name, multi_service_endpoint, resource_key)
annotation_file_uris = ['{specify_your_annotation_uri}'] # example: https://example_data.blob.core.windows.net/datasets/cat_dog/train_coco.json
# register dataset
if auth_kind == AuthenticationKind.SAS:
   # option 1: sas
   sas_auth = Authentication(AuthenticationKind.SAS, '{your_sas_token}') # note the token/query string is needed, not the full url
   dataset = Dataset(name=dataset_name,
                     annotation_kind=AnnotationKind.MULTICLASS_CLASSIFICATION,  # checkout AnnotationKind for all annotation kinds
                     annotation_file_uris=annotation_file_uris,
                     authentication=sas_auth)
else:
   # option 2: managed identity or public accessible. make sure your storage is accessible via the managed identiy, if it is not public accessible
   dataset = Dataset(name=dataset_name,
                     annotation_kind=AnnotationKind.MULTICLASS_CLASSIFICATION,  # checkout AnnotationKind for all annotation kinds
                     annotation_file_uris=annotation_file_uris)

reg_dataset = dataset_client.register_dataset(dataset)
logging.info(f'Register dataset: {reg_dataset.__dict__}')

# specify your evaluation dataset here, you can follow the same registeration process as the training dataset
eval_dataset = None
if eval_dataset:
   reg_eval_dataset = dataset_client.register_dataset(eval_dataset)
   logging.info(f'Register eval dataset: {reg_eval_dataset.__dict__}')

Train a model

After you register the dataset, use it to train a custom model:

from cognitive_service_vision_model_customization_python_samples import TrainingClient, Model, ModelKind, TrainingParameters, EvaluationParameters

model_name = '{specify_your_model_name}'

training_client = TrainingClient(resource_type, resource_name, multi_service_endpoint, resource_key)
train_params = TrainingParameters(training_dataset_name=dataset_name, time_budget_in_hours=1, model_kind=ModelKind.GENERIC_IC)  # checkout ModelKind for all valid model kinds
eval_params = EvaluationParameters(test_dataset_name=eval_dataset.name) if eval_dataset else None
model = Model(model_name, train_params, eval_params)
model = training_client.train_model(model)
logging.info(f'Start training: {model.__dict__}')

Check the training status

Use the following code to check the status of the asynchronous training operation.

from cognitive_service_vision_model_customization_python_samples import TrainingClient

training_client = TrainingClient(resource_type, resource_name, multi_service_endpoint, resource_key)
model = training_client.wait_for_completion(model_name, 30)

Predict with a sample image

Use the following code to get a prediction with a new sample image.

from cognitive_service_vision_model_customization_python_samples import PredictionClient
prediction_client = PredictionClient(resource_type, resource_name, multi_service_endpoint, resource_key)

with open('path_to_your_test_image.png', 'rb') as f:
    img = f.read()

prediction = prediction_client.predict(model_name, img, content_type='image/png')
logging.info(f'Prediction: {prediction}')

Next steps

In this guide, you created and trained a custom image classification model using Image Analysis. Next, learn more about the Analyze Image 4.0 API, so you can call your custom model from an application using REST or library SDKs.