測試自訂語音模型的辨識品質

發行項
10/16/2024

您可以在 Speech Studio 中檢查自訂語音模型的辨識品質。您可以播放上傳的音訊，並判斷提供的辨識結果是否正確。成功建立測試之後，您可以看到模型如何謄寫音訊資料集，或將兩個模型的結果並排比較。

並排模型測試有助於驗證最適合應用程式的語音辨識模型。如需精確度的目標量值 (需要謄寫資料集輸入)，請參閱以量化方式測試模型。

重要

進行測試時，系統會執行轉錄。因為每個服務供應項目和訂用帳戶層級的定價有所不同，所以請務必牢記這一點。請一律參閱官方的 Azure AI 服務價格，以取得最新的詳細資料。

建立測試

遵循下列指示以建立測試：

登入 Speech Studio。
瀏覽至 [Speech Studio]>[自訂語音]，然後從清單中選取您的專案名稱。
選取 [測試模型]>[建立新測試]。
選取 [檢查品質 (僅限音訊資料)]>[下一步]。
選擇您想要用於測試的音訊資料集，然後選取 [下一步]。如果沒有可用的資料集，請取消設定，然後前往 [語音資料集] 功能表以上傳資料集。
選擇一或兩個模型來評估和比較精確度。
輸入測試名稱和描述，然後選取 [下一步]。
檢閱您的設定，然後選取 [儲存後關閉]。

若要建立測試，請使用 spx csr evaluation create 命令。根據下列指示來建構要求參數：

將 project 參數設定為現有專案的識別碼。建議您使用此參數，以便您也可以在 Speech Studio 中檢視該測試。您可以執行 spx csr project list 命令來取得可用的專案。
將必要的 model1 參數設定為您想要測試之模型的識別碼。
將必要的 model2 參數設定為您想要測試之另一個模型的識別碼。如果您不想比較兩個模型，請針對 model1 和 model2 使用相同的模型。
將必要的 dataset 參數設定為您想要用於測試之資料集的識別碼。
設定 language 參數，否則語音 CLI 預設會設定 "en-US"。此參數應該是資料集內容的地區設定。稍後無法變更此地區設定。語音 CLI language 參數會對應至 JSON 要求和回應中的 locale 屬性。
設定必要的 name 參數。此參數是顯示在 Speech Studio 中的名稱。語音 CLI name 參數會對應至 JSON 要求和回應中的 displayName 屬性。

以下是建立測試的範例語音 CLI 命令：

spx csr evaluation create --api-version v3.2 --project 0198f569-cc11-4099-a0e8-9d55bc3d0c52 --dataset 23b6554d-21f9-4df1-89cb-f84510ac8d23 --model1 13fb305e-09ad-4bce-b3a1-938c9124dda3 --model2 13fb305e-09ad-4bce-b3a1-938c9124dda3 --name "My Inspection" --description "My Inspection Description"

您應該會收到下列格式的回應本文：

{
  "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/evaluations/9c06d5b1-213f-4a16-9069-bc86efacdaac",
  "model1": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/models/base/13fb305e-09ad-4bce-b3a1-938c9124dda3"
  },
  "model2": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/models/base/13fb305e-09ad-4bce-b3a1-938c9124dda3"
  },
  "dataset": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/datasets/23b6554d-21f9-4df1-89cb-f84510ac8d23"
  },
  "transcription2": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/transcriptions/b50642a8-febf-43e1-b9d3-e0c90b82a62a"
  },
  "transcription1": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/transcriptions/b50642a8-febf-43e1-b9d3-e0c90b82a62a"
  },
  "project": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/projects/0198f569-cc11-4099-a0e8-9d55bc3d0c52"
  },
  "links": {
    "files": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/evaluations/9c06d5b1-213f-4a16-9069-bc86efacdaac/files"
  },
  "properties": {
    "wordErrorRate1": -1.0,
    "sentenceErrorRate1": -1.0,
    "sentenceCount1": -1,
    "wordCount1": -1,
    "correctWordCount1": -1,
    "wordSubstitutionCount1": -1,
    "wordDeletionCount1": -1,
    "wordInsertionCount1": -1,
    "wordErrorRate2": -1.0,
    "sentenceErrorRate2": -1.0,
    "sentenceCount2": -1,
    "wordCount2": -1,
    "correctWordCount2": -1,
    "wordSubstitutionCount2": -1,
    "wordDeletionCount2": -1,
    "wordInsertionCount2": -1
  },
  "lastActionDateTime": "2024-07-14T21:21:39Z",
  "status": "NotStarted",
  "createdDateTime": "2024-07-14T21:21:39Z",
  "locale": "en-US",
  "displayName": "My Inspection",
  "description": "My Inspection Description"
}

回應本文中最上層 self 屬性是評估的 URI。使用此 URI 來取得專案和測試結果的詳細資料。您也可以使用此 URI 來更新或刪除評估。

如需使用評估的語音 CLI 說明，請執行下列命令：

spx help csr evaluation

若要建立測試，請使用語音轉換文字 REST API 的 Evaluations_Create 作業。根據下列指示來建構要求本文：

將 project 屬性設定為現有專案的 URI。建議使用此屬性，以便您也可以在 Speech Studio 中檢視該測試。您可以提出 Projects_List 要求以取得可用的專案。
將必要的 model1 屬性設定為您想要測試之模型的 URI。
將必要的 model2 屬性設定為您想要測試的另一個模型的 URI。如果您不想比較兩個模型，請針對 model1 和 model2 使用相同的模型。
將必要的 dataset 屬性設定為您想要用於測試的資料集的 URI。
設定必要的 locale 屬性。此屬性應該是資料集內容的地區設定。稍後無法變更此地區設定。
設定必要的 displayName 屬性。此屬性是顯示在 Speech Studio 中的名稱。

使用 URI 提出 HTTP POST 要求，如下列範例所示。以您的語音資源金鑰取代 YourSubscriptionKey、以您的語音資源區域取代 YourServiceRegion，並設定要求本文屬性，如前所述。

curl -v -X POST -H "Ocp-Apim-Subscription-Key: YourSubscriptionKey" -H "Content-Type: application/json" -d '{
  "model1": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/models/13fb305e-09ad-4bce-b3a1-938c9124dda3"
  },
  "model2": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/models/base/13fb305e-09ad-4bce-b3a1-938c9124dda3"
  },
  "dataset": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/datasets/23b6554d-21f9-4df1-89cb-f84510ac8d23"
  },
  "project": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/projects/0198f569-cc11-4099-a0e8-9d55bc3d0c52"
  },
  "displayName": "My Inspection",
  "description": "My Inspection Description",
  "locale": "en-US"
}'  "https://YourServiceRegion.api.cognitive.microsoft.com/speechtotext/v3.2/evaluations"

您應該會收到下列格式的回應本文：

{
  "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/evaluations/9c06d5b1-213f-4a16-9069-bc86efacdaac",
  "model1": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/models/base/13fb305e-09ad-4bce-b3a1-938c9124dda3"
  },
  "model2": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/models/base/13fb305e-09ad-4bce-b3a1-938c9124dda3"
  },
  "dataset": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/datasets/23b6554d-21f9-4df1-89cb-f84510ac8d23"
  },
  "transcription2": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/transcriptions/b50642a8-febf-43e1-b9d3-e0c90b82a62a"
  },
  "transcription1": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/transcriptions/b50642a8-febf-43e1-b9d3-e0c90b82a62a"
  },
  "project": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/projects/0198f569-cc11-4099-a0e8-9d55bc3d0c52"
  },
  "links": {
    "files": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/evaluations/9c06d5b1-213f-4a16-9069-bc86efacdaac/files"
  },
  "properties": {
    "wordErrorRate1": -1.0,
    "sentenceErrorRate1": -1.0,
    "sentenceCount1": -1,
    "wordCount1": -1,
    "correctWordCount1": -1,
    "wordSubstitutionCount1": -1,
    "wordDeletionCount1": -1,
    "wordInsertionCount1": -1,
    "wordErrorRate2": -1.0,
    "sentenceErrorRate2": -1.0,
    "sentenceCount2": -1,
    "wordCount2": -1,
    "correctWordCount2": -1,
    "wordSubstitutionCount2": -1,
    "wordDeletionCount2": -1,
    "wordInsertionCount2": -1
  },
  "lastActionDateTime": "2024-07-14T21:21:39Z",
  "status": "NotStarted",
  "createdDateTime": "2024-07-14T21:21:39Z",
  "locale": "en-US",
  "displayName": "My Inspection",
  "description": "My Inspection Description"
}

回應本文中最上層 self 屬性是評估的 URI。使用此 URI 來取得評估專案和測試結果的詳細資料。您也可以使用此 URI 來更新或刪除評估。

取得測試結果

您應該取得測試結果，並與每個模型的謄寫結果比較以檢查音訊資料集。

請遵循下列步驟來取得測試結果：

登入 Speech Studio。
選取 [自訂語音] > 您的專案名稱 > [測試模型]。
依測試名稱選取連結。
測試完成之後，如設定為 [成功] 的狀態所指出，您應該會看到結果，其中包含每個測試模型的 WER 數字。

此頁面會列出資料集中的所有語句和辨識結果，以及來自所提交資料集的轉錄。您可以切換各種錯誤類型，包括插入、刪除和替代。透過聆聽音訊並比較每個資料行中的辨識結果，您可以決定哪個模型符合您的需求，以及需要額外定型和改進的地方。

若要取得測試結果，請使用 spx csr evaluation status 命令。根據下列指示來建構要求參數：

將必要的 evaluation 參數設定為您要取得測試結果之評估的識別碼。

以下是取得測試結果的範例語音 CLI 命令：

spx csr evaluation status --api-version v3.2 --evaluation 9c06d5b1-213f-4a16-9069-bc86efacdaac

回應本文中會傳回模型、音訊資料集、謄寫和更多詳細資料。

您應該會收到下列格式的回應本文：

{
  "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/evaluations/9c06d5b1-213f-4a16-9069-bc86efacdaac",
  "model1": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/models/base/13fb305e-09ad-4bce-b3a1-938c9124dda3"
  },
  "model2": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/models/base/13fb305e-09ad-4bce-b3a1-938c9124dda3"
  },
  "dataset": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/datasets/23b6554d-21f9-4df1-89cb-f84510ac8d23"
  },
  "transcription2": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/transcriptions/b50642a8-febf-43e1-b9d3-e0c90b82a62a"
  },
  "transcription1": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/transcriptions/b50642a8-febf-43e1-b9d3-e0c90b82a62a"
  },
  "project": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/projects/0198f569-cc11-4099-a0e8-9d55bc3d0c52"
  },
  "links": {
    "files": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/evaluations/9c06d5b1-213f-4a16-9069-bc86efacdaac/files"
  },
  "properties": {
    "wordErrorRate1": 0.028900000000000002,
    "sentenceErrorRate1": 0.667,
    "tokenErrorRate1": 0.12119999999999999,
    "sentenceCount1": 3,
    "wordCount1": 173,
    "correctWordCount1": 170,
    "wordSubstitutionCount1": 2,
    "wordDeletionCount1": 1,
    "wordInsertionCount1": 2,
    "tokenCount1": 165,
    "correctTokenCount1": 145,
    "tokenSubstitutionCount1": 10,
    "tokenDeletionCount1": 1,
    "tokenInsertionCount1": 9,
    "tokenErrors1": {
      "punctuation": {
        "numberOfEdits": 4,
        "percentageOfAllEdits": 20.0
      },
      "capitalization": {
        "numberOfEdits": 2,
        "percentageOfAllEdits": 10.0
      },
      "inverseTextNormalization": {
        "numberOfEdits": 1,
        "percentageOfAllEdits": 5.0
      },
      "lexical": {
        "numberOfEdits": 12,
        "percentageOfAllEdits": 12.0
      },
      "others": {
        "numberOfEdits": 1,
        "percentageOfAllEdits": 5.0
      }
    },
    "wordErrorRate2": 0.028900000000000002,
    "sentenceErrorRate2": 0.667,
    "tokenErrorRate2": 0.12119999999999999,
    "sentenceCount2": 3,
    "wordCount2": 173,
    "correctWordCount2": 170,
    "wordSubstitutionCount2": 2,
    "wordDeletionCount2": 1,
    "wordInsertionCount2": 2,
    "tokenCount2": 165,
    "correctTokenCount2": 145,
    "tokenSubstitutionCount2": 10,
    "tokenDeletionCount2": 1,
    "tokenInsertionCount2": 9,
    "tokenErrors2": {
      "punctuation": {
        "numberOfEdits": 4,
        "percentageOfAllEdits": 20.0
      },
      "capitalization": {
        "numberOfEdits": 2,
        "percentageOfAllEdits": 10.0
      },
      "inverseTextNormalization": {
        "numberOfEdits": 1,
        "percentageOfAllEdits": 5.0
      },
      "lexical": {
        "numberOfEdits": 12,
        "percentageOfAllEdits": 12.0
      },
      "others": {
        "numberOfEdits": 1,
        "percentageOfAllEdits": 5.0
      }
    }
  },
  "lastActionDateTime": "2024-07-14T21:22:45Z",
  "status": "Succeeded",
  "createdDateTime": "2024-07-14T21:21:39Z",
  "locale": "en-US",
  "displayName": "My Inspection",
  "description": "My Inspection Description"
}

如需使用評估的語音 CLI 說明，請執行下列命令：

spx help csr evaluation

若要取得測試結果，請從使用語音轉換文字 REST API 的 Evaluations_Get 作業開始。

使用 URI 提出 HTTP GET 要求，如下列範例所示。以您的評估識別碼取代 YourEvaluationId、以語音資源金鑰取代 YourSubscriptionKey，並以語音資源區域取代 YourServiceRegion。

curl -v -X GET "https://YourServiceRegion.api.cognitive.microsoft.com/speechtotext/v3.2/evaluations/YourEvaluationId" -H "Ocp-Apim-Subscription-Key: YourSubscriptionKey"

回應本文中會傳回模型、音訊資料集、謄寫和更多詳細資料。

您應該會收到下列格式的回應本文：

{
  "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/evaluations/9c06d5b1-213f-4a16-9069-bc86efacdaac",
  "model1": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/models/base/13fb305e-09ad-4bce-b3a1-938c9124dda3"
  },
  "model2": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/models/base/13fb305e-09ad-4bce-b3a1-938c9124dda3"
  },
  "dataset": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/datasets/23b6554d-21f9-4df1-89cb-f84510ac8d23"
  },
  "transcription2": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/transcriptions/b50642a8-febf-43e1-b9d3-e0c90b82a62a"
  },
  "transcription1": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/transcriptions/b50642a8-febf-43e1-b9d3-e0c90b82a62a"
  },
  "project": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/projects/0198f569-cc11-4099-a0e8-9d55bc3d0c52"
  },
  "links": {
    "files": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/evaluations/9c06d5b1-213f-4a16-9069-bc86efacdaac/files"
  },
  "properties": {
    "wordErrorRate1": 0.028900000000000002,
    "sentenceErrorRate1": 0.667,
    "tokenErrorRate1": 0.12119999999999999,
    "sentenceCount1": 3,
    "wordCount1": 173,
    "correctWordCount1": 170,
    "wordSubstitutionCount1": 2,
    "wordDeletionCount1": 1,
    "wordInsertionCount1": 2,
    "tokenCount1": 165,
    "correctTokenCount1": 145,
    "tokenSubstitutionCount1": 10,
    "tokenDeletionCount1": 1,
    "tokenInsertionCount1": 9,
    "tokenErrors1": {
      "punctuation": {
        "numberOfEdits": 4,
        "percentageOfAllEdits": 20.0
      },
      "capitalization": {
        "numberOfEdits": 2,
        "percentageOfAllEdits": 10.0
      },
      "inverseTextNormalization": {
        "numberOfEdits": 1,
        "percentageOfAllEdits": 5.0
      },
      "lexical": {
        "numberOfEdits": 12,
        "percentageOfAllEdits": 12.0
      },
      "others": {
        "numberOfEdits": 1,
        "percentageOfAllEdits": 5.0
      }
    },
    "wordErrorRate2": 0.028900000000000002,
    "sentenceErrorRate2": 0.667,
    "tokenErrorRate2": 0.12119999999999999,
    "sentenceCount2": 3,
    "wordCount2": 173,
    "correctWordCount2": 170,
    "wordSubstitutionCount2": 2,
    "wordDeletionCount2": 1,
    "wordInsertionCount2": 2,
    "tokenCount2": 165,
    "correctTokenCount2": 145,
    "tokenSubstitutionCount2": 10,
    "tokenDeletionCount2": 1,
    "tokenInsertionCount2": 9,
    "tokenErrors2": {
      "punctuation": {
        "numberOfEdits": 4,
        "percentageOfAllEdits": 20.0
      },
      "capitalization": {
        "numberOfEdits": 2,
        "percentageOfAllEdits": 10.0
      },
      "inverseTextNormalization": {
        "numberOfEdits": 1,
        "percentageOfAllEdits": 5.0
      },
      "lexical": {
        "numberOfEdits": 12,
        "percentageOfAllEdits": 12.0
      },
      "others": {
        "numberOfEdits": 1,
        "percentageOfAllEdits": 5.0
      }
    }
  },
  "lastActionDateTime": "2024-07-14T21:22:45Z",
  "status": "Succeeded",
  "createdDateTime": "2024-07-14T21:21:39Z",
  "locale": "en-US",
  "displayName": "My Inspection",
  "description": "My Inspection Description"
}

比較謄寫與音訊

您可以根據音訊輸入資料集，檢查每個測試模型的謄寫輸出。如果您在測試中包含兩個模型，您可以並排比較其謄寫品質。

若要審查謄寫的品質：

登入 Speech Studio。
選取 [自訂語音] > 您的專案名稱 > [測試模型]。
依測試名稱選取連結。
在讀取模型的對應謄寫時播放音訊檔案。

如果測試資料集包含多個音訊檔案，您會在資料表中看到多個資料列。如果您在測試中包含兩個模型，謄寫會顯示在並排資料行中。模型之間的謄寫差異會以藍色文字字型顯示。

比較兩個模型謄寫的螢幕擷取畫面

測試結果中會傳回音訊測試資料集、謄寫和測試模型。如果只測試了一個模型，model1 值會符合 model2，而 transcription1 值會符合 transcription2。

若要審查謄寫的品質：

除非您已經有複本，否則請下載音訊測試資料集。
下載輸出謄寫。
在讀取模型的對應謄寫時播放音訊檔案。

如果您要比較兩個模型之間的品質，請特別注意每個模型謄寫之間的差異。

測試結果中會傳回音訊測試資料集、謄寫和測試模型。如果只測試了一個模型，model1 值會符合 model2，而 transcription1 值會符合 transcription2。

若要審查謄寫的品質：

除非您已經有複本，否則請下載音訊測試資料集。
下載輸出謄寫。
在讀取模型的對應謄寫時播放音訊檔案。

如果您要比較兩個模型之間的品質，請特別注意每個模型謄寫之間的差異。

共用方式為

測試自訂語音模型的辨識品質

建立測試

取得測試結果

比較謄寫與音訊

下一步

意見反應

其他資源