Create and explore Azure Machine Learning dataset with labels
In this article, you'll learn how to export the data labels from an Azure Machine Learning data labeling project and load them into popular formats such as, a pandas dataframe for data exploration.
What are datasets with labels
Azure Machine Learning datasets with labels are referred to as labeled datasets. These specific datasets are TabularDatasets with a dedicated label column and are only created as an output of Azure Machine Learning data labeling projects. Create a data labeling project for image labeling or text labeling. Machine Learning supports data labeling projects for image classification, either multi-label or multi-class, and object identification together with bounded boxes.
Prerequisites
- An Azure subscription. If you don’t have an Azure subscription, create a free account before you begin.
- The Azure Machine Learning SDK for Python, or access to Azure Machine Learning studio.
- A Machine Learning workspace. See Create workspace resources.
- Access to an Azure Machine Learning data labeling project. If you don't have a labeling project, first create one for image labeling or text labeling.
Export data labels
When you complete a data labeling project, you can export the label data from a labeling project. Doing so, allows you to capture both the reference to the data and its labels, and export them in COCO format or as an Azure Machine Learning dataset.
Use the Export button on the Project details page of your labeling project.
COCO
The COCO file is created in the default blob store of the Azure Machine Learning workspace in a folder within export/coco.
Note
In object detection projects, the exported "bbox": [x,y,width,height]" values in COCO file are normalized. They are scaled to 1. Example : a bounding box at (10, 10) location, with 30 pixels width , 60 pixels height, in a 640x480 pixel image will be annotated as (0.015625. 0.02083, 0.046875, 0.125). Since the coordintes are normalized, it will show as '0.0' as "width" and "height" for all images. The actual width and height can be obtained using Python library like OpenCV or Pillow(PIL).
Azure Machine Learning dataset
You can access the exported Azure Machine Learning dataset in the Datasets section of your Azure Machine Learning studio. The dataset Details page also provides sample code to access your labels from Python.
Tip
Once you have exported your labeled data to an Azure Machine Learning dataset, you can use AutoML to build computer vision models trained on your labeled data. Learn more at Set up AutoML to train computer vision models with Python
Explore labeled datasets via pandas dataframe
Load your labeled datasets into a pandas dataframe to leverage popular open-source libraries for data exploration with the to_pandas_dataframe()
method from the azureml-dataprep
class.
Install the class with the following shell command:
pip install azureml-dataprep
In the following code, the animal_labels
dataset is the output from a labeling project previously saved to the workspace.
The exported dataset is a TabularDataset.
APPLIES TO: Python SDK azureml v1
import azureml.core
from azureml.core import Dataset, Workspace
# get animal_labels dataset from the workspace
animal_labels = Dataset.get_by_name(workspace, 'animal_labels')
animal_pd = animal_labels.to_pandas_dataframe()
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
#read images from dataset
img = mpimg.imread(animal_pd['image_url'].iloc(0).open())
imgplot = plt.imshow(img)