Train a small object detection model with AutoML (preview) (v1)

APPLIES TO: Python SDK azureml v1

Important

Some of the Azure CLI commands in this article use the azure-cli-ml, or v1, extension for Azure Machine Learning. Support for the v1 extension will end on September 30, 2025. You will be able to install and use the v1 extension until that date.

We recommend that you transition to the ml, or v2, extension before September 30, 2025. For more information on the v2 extension, see Azure ML CLI extension and Python SDK v2.

Important

This feature is currently in public preview. This preview version is provided without a service-level agreement. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

In this article, you'll learn how to train an object detection model to detect small objects in high-resolution images with automated ML in Azure Machine Learning.

Typically, computer vision models for object detection work well for datasets with relatively large objects. However, due to memory and computational constraints, these models tend to under-perform when tasked to detect small objects in high-resolution images. Because high-resolution images are typically large, they are resized before input into the model, which limits their capability to detect smaller objects--relative to the initial image size.

To help with this problem, automated ML supports tiling as part of the public preview computer vision capabilities. The tiling capability in automated ML is based on the concepts in The Power of Tiling for Small Object Detection.

When tiling, each image is divided into a grid of tiles. Adjacent tiles overlap with each other in width and height dimensions. The tiles are cropped from the original as shown in the following image.

Diagram that shows an image being divided into a grid of overlapping tiles.

Prerequisites

Supported models

Small object detection using tiling is supported for all models supported by Automated ML for images for object detection task.

Enable tiling during training

To enable tiling, you can set the tile_grid_size parameter to a value like (3, 2); where 3 is the number of tiles along the width dimension and 2 is the number of tiles along the height dimension. When this parameter is set to (3, 2), each image is split into a grid of 3 x 2 tiles. Each tile overlaps with the adjacent tiles, so that any objects that fall on the tile border are included completely in one of the tiles. This overlap can be controlled by the tile_overlap_ratio parameter, which defaults to 25%.

When tiling is enabled, the entire image and the tiles generated from it are passed through the model. These images and tiles are resized according to the min_size and max_size parameters before feeding to the model. The computation time increases proportionally because of processing this extra data.

For example, when the tile_grid_size parameter is (3, 2), the computation time would be approximately seven times higher than without tiling.

You can specify the value for tile_grid_size in your hyperparameter space as a string.

parameter_space = {
    'model_name': choice('fasterrcnn_resnet50_fpn'),
    'tile_grid_size': choice('(3, 2)'),
    ...
}

The value for tile_grid_size parameter depends on the image dimensions and size of objects within the image. For example, larger number of tiles would be helpful when there are smaller objects in the images.

To choose the optimal value for this parameter for your dataset, you can use hyperparameter search. To do so, you can specify a choice of values for this parameter in your hyperparameter space.

parameter_space = {
    'model_name': choice('fasterrcnn_resnet50_fpn'),
    'tile_grid_size': choice('(2, 1)', '(3, 2)', '(5, 3)'),
    ...
}

Tiling during inference

When a model trained with tiling is deployed, tiling also occurs during inference. Automated ML uses the tile_grid_size value from training to generate the tiles during inference. The entire image and corresponding tiles are passed through the model, and the object proposals from them are merged to output final predictions, like in the following image.

Diagram that shows object proposals from image and tiles being merged to form the final predictions.

Note

It's possible that the same object is detected from multiple tiles, duplication detection is done to remove such duplicates.

Duplicate detection is done by running NMS on the proposals from the tiles and the image. When multiple proposals overlap, the one with the highest score is picked and others are discarded as duplicates.Two proposals are considered to be overlapping when the intersection over union (iou) between them is greater than the tile_predictions_nms_thresh parameter.

You also have the option to enable tiling only during inference without enabling it in training. To do so, set the tile_grid_size parameter only during inference, not for training.

Doing so, may improve performance for some datasets, and won't incur the extra cost that comes with tiling at training time.

Tiling hyperparameters

The following are the parameters you can use to control the tiling feature.

Parameter Name Description Default
tile_grid_size The grid size to use for tiling each image. Available for use during training, validation, and inference.

Tuple of two integers passed as a string, e.g '(3, 2)'

Note: Setting this parameter increases the computation time proportionally, since all tiles and images are processed by the model.
no default value
tile_overlap_ratio Controls the overlap ratio between adjacent tiles in each dimension. When the objects that fall on the tile boundary are too large to fit completely in one of the tiles, increase the value of this parameter so that the objects fit in at least one of the tiles completely.

Must be a float in [0, 1).
0.25
tile_predictions_nms_thresh The intersection over union threshold to use to do non-maximum suppression (nms) while merging predictions from tiles and image. Available during validation and inference. Change this parameter if there are multiple boxes detected per object in the final predictions.

Must be float in [0, 1].
0.25

Example notebooks

See the object detection sample notebook for detailed code examples of setting up and training an object detection model.

Note

All images in this article are made available in accordance with the permitted use section of the MIT licensing agreement. Copyright © 2020 Roboflow, Inc.

Next steps