Set up an image labeling project and export labels

Learn how to create and run data labeling projects to label images in Azure Machine Learning. Use machine learning (ML)-assisted data labeling or human-in-the-loop labeling to help with the task.

Set up labels for classification, object detection (bounding box), instance segmentation (polygon), or semantic segmentation (Preview).

You can also use the data labeling tool in Azure Machine Learning to create a text labeling project.


Items marked (preview) in this article are currently in public preview. The preview version is provided without a service level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

Image labeling capabilities

Azure Machine Learning data labeling is a tool you can use to create, manage, and monitor data labeling projects. Use it to:

  • Coordinate data, labels, and team members to efficiently manage labeling tasks.
  • Track progress and maintain the queue of incomplete labeling tasks.
  • Start and stop the project, and control the labeling progress.
  • Review and export the labeled data as an Azure Machine Learning dataset.


The data images you work with in the Azure Machine Learning data labeling tool must be available in an Azure Blob Storage datastore. If you don't have an existing datastore, you can upload your data files to a new datastore when you create a project.

Image data can be any file that has one of these file extensions:

  • .jpg
  • .jpeg
  • .png
  • .jpe
  • .jfif
  • .bmp
  • .tif
  • .tiff
  • .dcm
  • .dicom

Each file is an item to be labeled.


You use these items to set up image labeling in Azure Machine Learning:

  • The data that you want to label, either in local files or in Azure Blob Storage.
  • The set of labels that you want to apply.
  • The instructions for labeling.
  • An Azure subscription. If you don't have an Azure subscription, create a free account before you begin.
  • An Azure Machine Learning workspace. See Create an Azure Machine Learning workspace.

Create an image labeling project

Labeling projects are administered in Azure Machine Learning. Use the Data Labeling page in Machine Learning to manage your projects.

If your data is already in Azure Blob Storage, make sure that it's available as a datastore before you create the labeling project.

  1. To create a project, select Add project.

  2. For Project name, enter a name for the project.

    You can't reuse the project name, even if you delete the project.

  3. To create an image labeling project, for Media type, select Image.

  4. For Labeling task type, select an option for your scenario:

    • To apply only a single label to an image from a set of labels, select Image Classification Multi-class.
    • To apply one or more labels to an image from a set of labels, select Image Classification Multi-label. For example, a photo of a dog might be labeled with both dog and daytime.
    • To assign a label to each object within an image and add bounding boxes, select Object Identification (Bounding Box).
    • To assign a label to each object within an image and draw a polygon around each object, select Instance Segmentation (Polygon).
    • To draw masks on an image and assign a label class at the pixel level, select Semantic Segmentation (Preview).

    Screenshot that shows creating a labeling project to manage labeling.

  5. Select Next to continue.

Add workforce (optional)

Select Use a vendor labeling company from Azure Marketplace only if you've engaged a data labeling company from Azure Marketplace. Then select the vendor. If your vendor doesn't appear in the list, clear this option.

Make sure that you first contact the vendor and sign a contract. For more information, see Work with a data labeling vendor company (preview).

Select Next to continue.

Specify the data to label

If you already created a dataset that contains your data, select the dataset in the Select an existing dataset dropdown. You can also select Create a dataset to use an existing Azure datastore or to upload local files.


A project can't contain more than 500,000 files. If your dataset exceeds this file count, only the first 500,000 files are loaded.

Create a dataset from an Azure datastore

In many cases, you can upload local files. However, Azure Storage Explorer provides a faster and more robust way to transfer a large amount of data. We recommend Storage Explorer as the default way to move files.

To create a dataset from data that's already stored in Blob Storage:

  1. Select Create.
  2. For Name, enter a name for your dataset. Optionally, enter a description.
  3. Ensure that Dataset type is set to File. Only file dataset types are supported for images.
  4. Select Next.
  5. Select From Azure storage, and then select Next.
  6. Select the datastore, and then select Next.
  7. If your data is in a subfolder within Blob Storage, choose Browse to select the path.
    • To include all the files in the subfolders of the selected path, append /** to the path.
    • To include all the data in the current container and its subfolders, append **/*.* to the path.
  8. Select Create.
  9. Select the data asset you created.

Create a dataset from uploaded data

To directly upload your data:

  1. Select Create.
  2. For Name, enter a name for your dataset. Optionally, enter a description.
  3. Ensure that Dataset type is set to File. Only file dataset types are supported for images.
  4. Select Next.
  5. Select From local files, and then select Next.
  6. (Optional) Select a datastore. You can also leave the default to upload to the default blob store (workspaceblobstore) for your Machine Learning workspace.
  7. Select Next.
  8. Select Upload > Upload files or Upload > Upload folder to select the local files or folders to upload.
  9. In the browser window, find your files or folders, and then select Open.
  10. Continue to select Upload until you specify all your files and folders.
  11. Optionally, you can choose to select the Overwrite if already exists checkbox. Verify the list of files and folders.
  12. Select Next.
  13. Confirm the details. Select Back to modify the settings or select Create to create the dataset.
  14. Finally, select the data asset you created.

Configure incremental refresh

If you plan to add new data files to your dataset, use incremental refresh to add the files to your project.

When Enable incremental refresh at regular intervals is set, the dataset is checked periodically for new files to be added to a project based on the labeling completion rate. The check for new data stops when the project contains the maximum 500,000 files.

Select Enable incremental refresh at regular intervals when you want your project to continually monitor for new data in the datastore.

Clear the selection if you don't want new files in the datastore to automatically be added to your project.


Don't create a new version for the dataset you want to update. If you do, the updates won't be seen because the data labeling project is pinned to the initial version. Instead, use Azure Storage Explorer to modify your data in the appropriate folder in Blob Storage.

Also, don't remove data. Removing data from the dataset your project uses causes an error in the project.

After the project is created, use the Details tab to change incremental refresh, view the time stamp for the last refresh, and request an immediate refresh of data.

Specify label classes

On the Label categories page, specify a set of classes to categorize your data.

Your labelers' accuracy and speed are affected by their ability to choose among classes. For instance, instead of spelling out the full genus and species for plants or animals, use a field code or abbreviate the genus.

You can use either a flat list or create groups of labels.

  • To create a flat list, select Add label category to create each label.

    Screenshot that shows how to add a flat structure of labels.

  • To create labels in different groups, select Add label category to create the top-level labels. Then select the plus sign (+) under each top level to create the next level of labels for that category. You can create up to six levels for any grouping.

    Screenshot that shows how to add groups of labels.

You can select labels at any level during the tagging process. For example, the labels Animal, Animal/Cat, Animal/Dog, Color, Color/Black, Color/White, and Color/Silver are all available choices for a label. In a multi-label project, there's no requirement to pick one of each category. If that is your intent, make sure to include this information in your instructions.

Describe the image labeling task

It's important to clearly explain the labeling task. On the Labeling instructions page, you can add a link to an external site that has labeling instructions, or you can provide instructions in the edit box on the page. Keep the instructions task-oriented and appropriate to the audience. Consider these questions:

  • What are the labels labelers will see, and how will they choose among them? Is there a reference text to refer to?
  • What should they do if no label seems appropriate?
  • What should they do if multiple labels seem appropriate?
  • What confidence threshold should they apply to a label? Do you want the labeler's best guess if they aren't certain?
  • What should they do with partially occluded or overlapping objects of interest?
  • What should they do if an object of interest is clipped by the edge of the image?
  • What should they do if they think they made a mistake after they submit a label?
  • What should they do if they discover image quality issues, including poor lighting conditions, reflections, loss of focus, undesired background included, abnormal camera angles, and so on?
  • What should they do if multiple reviewers have different opinions about applying a label?

For bounding boxes, important questions include:

  • How is the bounding box defined for this task? Should it stay entirely on the interior of the object or should it be on the exterior? Should it be cropped as closely as possible, or is some clearance acceptable?
  • What level of care and consistency do you expect the labelers to apply in defining bounding boxes?
  • What is the visual definition of each label class? Can you provide a list of normal, edge, and counter cases for each class?
  • What should the labelers do if the object is tiny? Should it be labeled as an object or should they ignore that object as background?
  • How should labelers handle an object that's only partially shown in the image?
  • How should labelers handle an object that's partially covered by another object?
  • How should labelers handle an object that has no clear boundary?
  • How should labelers handle an object that isn't the object class of interest but has visual similarities to a relevant object type?


Labelers can select the first nine labels by using number keys 1 through 9.

Quality control (preview)

To get more accurate labels, use the Quality control page to send each item to multiple labelers.


Consensus labeling is currently in public preview.

The preview version is provided without a service level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities.

For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

To have each item sent to multiple labelers, select Enable consensus labeling (preview). Then set values for Minimum labelers and Maximum labelers to specify how many labelers to use. Make sure that you have as many labelers available as your maximum number. You can't change these settings after the project has started.

If a consensus is reached from the minimum number of labelers, the item is labeled. If a consensus isn't reached, the item is sent to more labelers. If there's no consensus after the item goes to the maximum number of labelers, its status is Needs Review, and the project owner is responsible for labeling the item.


Instance Segmentation projects can't use consensus labeling.

Use ML-assisted data labeling

To accelerate labeling tasks, on the ML assisted labeling page, you can trigger automatic machine learning models. Medical images (files that have a .dcm extension) aren't included in assisted labeling. If the project type is Semantic Segmentation (Preview), ML-assisted labeling isn't available.

At the start of your labeling project, the items are shuffled into a random order to reduce potential bias. However, the trained model reflects any biases that are present in the dataset. For example, if 80 percent of your items are of a single class, then approximately 80 percent of the data used to train the model lands in that class.

To enable assisted labeling, select Enable ML assisted labeling and specify a GPU. If you don't have a GPU in your workspace, a GPU cluster is created for you and added to your workspace. The cluster is created with a minimum of zero nodes, which means it costs nothing when not in use.

ML-assisted labeling consists of two phases:

  • Clustering
  • Pre-labeling

The labeled data item count that's required to start assisted labeling isn't a fixed number. This number can vary significantly from one labeling project to another. For some projects, it's sometimes possible to see pre-label or cluster tasks after 300 items have been manually labeled. ML-assisted labeling uses a technique called transfer learning. Transfer learning uses a pre-trained model to jump-start the training process. If the classes of your dataset resemble the classes in the pre-trained model, pre-labels might become available after only a few hundred manually labeled items. If your dataset significantly differs from the data that's used to pre-train the model, the process might take more time.

When you use consensus labeling, the consensus label is used for training.

Because the final labels still rely on input from the labeler, this technology is sometimes called human-in-the-loop labeling.


ML-assisted data labeling doesn't support default storage accounts that are secured behind a virtual network. You must use a non-default storage account for ML-assisted data labeling. The non-default storage account can be secured behind the virtual network.


After you submit some labels, the classification model starts to group together similar items. These similar images are presented to labelers on the same page to help make manual tagging more efficient. Clustering is especially useful when a labeler views a grid of four, six, or nine images.

After a machine learning model is trained on your manually labeled data, the model is truncated to its last fully connected layer. Unlabeled images are then passed through the truncated model in a process called embedding or featurization. This process embeds each image in a high-dimensional space that the model layer defines. Other images in the space that are nearest the image are used for clustering tasks.

The clustering phase doesn't appear for object detection models or text classification.


After you submit enough labels for training, either a classification model predicts tags or an object detection model predicts bounding boxes. The labeler now sees pages that contain predicted labels already present on each item. For object detection, predicted boxes are also shown. The task involves reviewing these predictions and correcting any incorrectly labeled images before page submission.

After a machine learning model is trained on your manually labeled data, the model is evaluated on a test set of manually labeled items. The evaluation helps determine the model's accuracy at different confidence thresholds. The evaluation process sets a confidence threshold beyond which the model is accurate enough to show pre-labels. The model is then evaluated against unlabeled data. Items with predictions that are more confident than the threshold are used for pre-labeling.

Initialize the image labeling project

After the labeling project is initialized, some aspects of the project are immutable. You can't change the task type or dataset. You can modify labels and the URL for the task description. Carefully review the settings before you create the project. After you submit the project, you return to the Data Labeling overview page, which shows the project as Initializing.


This page might not automatically refresh. After a pause, manually refresh the page to see the project's status as Created.

Run and monitor the project

After you initialize the project, Azure begins to run it. To see the project details, select the project on the main Data Labeling page.

To pause or restart the project, on the project command bar, toggle the Running status. You can label data only when the project is running.


The Dashboard tab shows the progress of the labeling task.

Screenshot that shows the data labeling dashboard.

The progress charts show how many items have been labeled, skipped, need review, or aren't yet complete. Hover over the chart to see the number of items in each section.

A distribution of the labels for completed tasks is shown below the chart. In some project types, an item can have multiple labels. The total number of labels can exceed the total number of items.

A distribution of labelers and how many items they've labeled also are shown.

The middle section shows a table that has a queue of unassigned tasks. When ML-assisted labeling is off, this section shows the number of manual tasks that are awaiting assignment.

When ML-assisted labeling is on, this section also shows:

  • Tasks that contain clustered items in the queue.
  • Tasks that contain pre-labeled items in the queue.

Additionally, when ML-assisted labeling is enabled, you can scroll down to see the ML-assisted labeling status. The Jobs sections give links for each of the machine learning runs.

  • Training: Trains a model to predict the labels.
  • Validation: Determines whether item pre-labeling uses the prediction of this model.
  • Inference: Prediction run for new items.
  • Featurization: Clusters items (only for image classification projects).

Data tab

On the Data tab, you can see your dataset and review labeled data. Scroll through the labeled data to see the labels. If you see data that's incorrectly labeled, select it and choose Reject to remove the labels and return the data to the unlabeled queue.

If your project uses consensus labeling, review images that have no consensus:

  1. Select the Data tab.

  2. On the left menu, select Review labels.

  3. On the command bar above Review labels, select All filters.

    Screenshot that shows how to select filters to review consensus label problems.

  4. Under Labeled datapoints, select Consensus labels in need of review to show only images for which the labelers didn't come to a consensus.

    Screenshot that shows how to select labels in need of review.

  5. For each image to review, select the Consensus label dropdown to view the conflicting labels.

    Screenshot that shows the Select Consensus label dropdown to review conflicting labels.

  6. Although you can select an individual labeler to see their labels, to update or reject the labels, you must use the top choice, Consensus label (preview).

Details tab

View and change details of your project. On this tab, you can:

  • View project details and input datasets.
  • Set or clear the Enable incremental refresh at regular intervals option, or request an immediate refresh.
  • View details of the storage container that's used to store labeled outputs in your project.
  • Add labels to your project.
  • Edit instructions you give to your labels.
  • Change settings for ML-assisted labeling and kick off a labeling task.

Vision Studio tab

If your project was created from Vision Studio, you'll also see a Vision Studio tab. Select Go to Vision Studio to return to Vision Studio. Once you return to Vision Studio, you will be able to import your labeled data.

Access for labelers

Anyone who has Contributor or Owner access to your workspace can label data in your project.

You can also add users and customize the permissions so that they can access labeling but not other parts of the workspace or your labeling project. For more information, see Add users to your data labeling project.

Add new labels to a project

During the data labeling process, you might want to add more labels to classify your items. For example, you might want to add an Unknown or Other label to indicate confusion.

To add one or more labels to a project:

  1. On the main Data Labeling page, select the project.

  2. On the project command bar, toggle the status from Running to Paused to stop labeling activity.

  3. Select the Details tab.

  4. In the list on the left, select Label categories.

  5. Modify your labels.

    Screenshot that shows how to add a label in Machine Learning Studio.

  6. In the form, add your new label. Then choose how to continue the project. Because you've changed the available labels, choose how to treat data that's already labeled:

    • Start over, and remove all existing labels. Choose this option if you want to start labeling from the beginning by using the new full set of labels.
    • Start over, and keep all existing labels. Choose this option to mark all data as unlabeled, but keep the existing labels as a default tag for images that were previously labeled.
    • Continue, and keep all existing labels. Choose this option to keep all data already labeled as it is, and start using the new label for data that's not yet labeled.
  7. Modify your instructions page as necessary for new labels.

  8. After you've added all new labels, toggle Paused to Running to restart the project.

Start an ML-assisted labeling task

ML-assisted labeling starts automatically after some items have been labeled. This automatic threshold varies by project. You can manually start an ML-assisted training run if your project contains at least some labeled data.


On-demand training is not available for projects created before December 2022. To use this feature, create a new project.

To start a new ML-assisted training run:

  1. At the top of your project, select Details.
  2. On the left menu, select ML assisted labeling.
  3. Near the bottom of the page, for On-demand training, select Start.

Export the labels

To export the labels, on the Project details page of your labeling project, select the Export button. You can export the label data for Machine Learning experimentation at any time.

If your project type is Semantic segmentation (Preview), an Azure MLTable data asset is created.

For all other project types, you can export an image label as:

  • A CSV file. Azure Machine Learning creates the CSV file in a folder inside Labeling/export/csv.
  • A COCO format file. Azure Machine Learning creates the COCO file in a folder inside Labeling/export/coco.
  • An Azure MLTable data asset.

When you export a CSV or COCO file, a notification appears briefly when the file is ready to download. Select the Download file link to download your results. You'll also find the notification in the Notification section on the top bar:

Screenshot that shows the notification for the file download.

Access exported Azure Machine Learning datasets and data assets in the Data section of Machine Learning. The data details page also provides sample code you can use to access your labels by using Python.

Screenshot that shows an example of the dataset details page in Machine Learning.

After you export your labeled data to an Azure Machine Learning dataset, you can use AutoML to build computer vision models that are trained on your labeled data. Learn more at Set up AutoML to train computer vision models by using Python.

Troubleshoot issues

Use these tips if you see any of the following issues:

Issue Resolution
Only datasets created on blob datastores can be used. This issue is a known limitation of the current release.
Removing data from the dataset your project uses causes an error in the project. Don't remove data from the version of the dataset you used in a labeling project. Create a new version of the dataset to use to remove data.
After a project is created, the project status is Initializing for an extended time. Manually refresh the page. Initialization should complete at roughly 20 data points per second. No automatic refresh is a known issue.
Newly labeled items aren't visible in data review. To load all labeled items, select the First button. The First button takes you back to the front of the list, and it loads all labeled data.
You can't assign a set of tasks to a specific labeler. This issue is a known limitation of the current release.

Troubleshoot object detection

Issue Resolution
If you select the Esc key when you label for object detection, a zero-size label is created and label submission fails. To delete the label, select the X delete icon next to the label.

Next steps