建立和探索具有標籤的 Azure Machine Learning 資料集

2025-05-02

這很重要

本文提供使用 Azure Machine Learning SDK v1 的相關信息。自 2025 年 3 月 31 日起，SDK v1 已淘汰，其支援將於 2026 年 6 月 30 日結束。您能夠安裝並使用 SDK v1，直到該日期為止。

建議您在 2026 年 6 月 30 日之前轉換至 SDK v2。如需 SDK v2 的詳細資訊，請參閱什麼是 Azure Machine Learning Python SDK v2 和 SDK v2 參考。

在本文中，您將瞭解如何從 Azure Machine Learning 數據標籤專案匯出數據標籤，並將其載入熱門格式，例如用於數據探索的 pandas 數據框架。

什麼是具有標籤的資料集

具有標籤的 Azure Machine Learning 資料集，即稱為標籤資料集。這些特定的資料集，是具有專用標籤資料行的 TabularDataset，並且僅建立作為 Azure Machine Learning 資料標記專案的輸出。建立資料標記專案進行影像標記或文字標記。 Machine Learning 支援用於影像分類的資料標記專案 (多標籤或多類別)，以及搭配使用週框方塊的物體識別。

必要條件

Azure 訂用帳戶。如果您沒有 Azure 訂用帳戶，請在開始前建立免費帳戶。
適用於 Python 的 Azure Machine Learning SDK，或 Azure Machine Learning Studio 的存取權。
Machine Learning 工作區。請參閱建立工作區資源。
Azure Machine Learning 資料標記專案的存取權。如果您沒有標記專案，請首先建立一個標記專案進行影像標記或文字標記。

匯出資料標籤

當您完成資料標記專案後，即可從標記專案匯出標籤資料。這麼做可讓您同時擷取資料和其標籤的參考，並以 COCO 格式或 Azure Machine Learning 資料集來匯出這些資料。

在標籤專案的 [專案詳細資料] 頁面上使用 [匯出] 按鈕。

可哥

COCO 檔案會建立在 Azure Machine Learning 工作區的預設 Blob 存放區中，而其所在的資料夾位於 export/coco 內。

注意

在對象偵測專案中，COCO 檔案中的導出 bbox: [x,y,width,height] 值會正規化。它們會縮放至 1。例如，邊界框位於 (10, 10) 位置，寬度為 30 像素，高度 60 像素，在 640x480 像素的影像中標註為 (0.015625, 0.02083, 0.046875, 0.125)。由於座標已正規化，因此所有影像的『0.0』顯示為「寬度」和「高度」。您可以使用 Python 程式庫 (例如 OpenCV 或 Pillow (PIL)) 來取得實際的寬度和高度。

Azure Machine Learning 資料集

您可以在 Azure Machine Learning Studio 的 [資料集] 區段中，存取已匯出的 Azure Machine Learning 資料集。資料集詳細資料頁面也會提供從 Python 存取標籤的程式碼範例。

所匯出的資料集

提示

將已標記的數據導出至 Azure Machine Learning 數據集之後，您就可以使用 AutoML 來建置以標籤數據定型的電腦視覺模型。深入了解使用 Python 設定 AutoML 以定型電腦視覺模型

透過 pandas 資料框架探索已標記的資料集

將已標記的數據集載入 pandas 數據框架，以使用熱門的開放原始碼連結庫，搭配 to_pandas_dataframe() 類別中的方法 azureml-dataprep 進行數據探索。

使用下列殼層命令來安裝類別：

pip install azureml-dataprep

在下列程式碼中，animal_labels 資料集是標記專案先前儲存至工作區的輸出。匯出的資料集是 TabularDataset。

適用於：Python SDK azureml v1 (部分機器翻譯)

import azureml.core
from azureml.core import Dataset, Workspace

# get animal_labels dataset from the workspace
animal_labels = Dataset.get_by_name(workspace, 'animal_labels')
animal_pd = animal_labels.to_pandas_dataframe()