How to log audio and transcriptions for speech recognition

Статия
09/20/2024

You can enable logging for both audio input and recognized speech when using speech to text or speech translation. For speech translation, only the audio and transcription of the original audio are logged. The translations aren't logged. This article describes how to enable, access and delete the audio and transcription logs.

Audio and transcription logs can be used as input for custom speech model training. You might have other use cases.

Warning

Don't depend on audio and transcription logs when the exact record of input audio is required. In the periods of peak load, the service prioritizes hardware resources for transcription tasks. This may result in minor parts of the audio not being logged. Such occasions are rare, but nevertheless possible.

Logging is done asynchronously for both base and custom model endpoints. The Speech service stores audio and transcription logs in its internal storage and not written locally. The logs are retained for 30 days. After this period, the logs are automatically deleted. However you can delete specific logs or a range of available logs at any time.

You can also store audio and transcription logs within an Azure Storage account you own and control instead of Speech service premises using Bring-your-own-storage (BYOS) technology. See details on how to use BYOS-enabled Speech resource in this article.

Enable audio and transcription logging

Logging is disabled by default. Logging can be enabled per recognition session or per custom model endpoint.

Enable logging for a single recognition session

You can enable logging for a single recognition session, whether using the default base model or custom model endpoint.

Warning

For custom model endpoints, the logging setting of your deployed endpoint is prioritized over your session-level setting (SDK or REST API). If logging is enabled for the custom model endpoint, the session-level setting (whether it's set to true or false) is ignored. If logging isn't enabled for the custom model endpoint, the session-level setting determines whether logging is active.

Enable logging for speech to text with the Speech SDK

To enable audio and transcription logging with the Speech SDK, you execute the method EnableAudioLogging() of the SpeechConfig class instance.

speechConfig.EnableAudioLogging();

To check whether logging is enabled, get the value of the SpeechServiceConnection_EnableAudioLogging property:

string isAudioLoggingEnabled = speechConfig.GetProperty(PropertyId.SpeechServiceConnection_EnableAudioLogging);

Each SpeechRecognizer that uses this speechConfig has audio and transcription logging enabled.

To enable audio and transcription logging with the Speech SDK, you execute the method EnableAudioLogging of the SpeechConfig class instance.

speechConfig->EnableAudioLogging();

To check whether logging is enabled, get the value of the SpeechServiceConnection_EnableAudioLogging property:

string isAudioLoggingEnabled = speechConfig->GetProperty(PropertyId::SpeechServiceConnection_EnableAudioLogging);

Each SpeechRecognizer that uses this speechConfig has audio and transcription logging enabled.

To enable audio and transcription logging with the Speech SDK, you execute the method enableAudioLogging() of the SpeechConfig class instance.

speechConfig.enableAudioLogging();

To check whether logging is enabled, get the value of the SpeechServiceConnection_EnableAudioLogging property:

String isAudioLoggingEnabled = speechConfig.getProperty(PropertyId.SpeechServiceConnection_EnableAudioLogging);

Each SpeechRecognizer that uses this speechConfig has audio and transcription logging enabled.

To enable audio and transcription logging with the Speech SDK, you execute the method enableAudioLogging() of the SpeechConfig class instance.

speechConfig.enableAudioLogging();

To check whether logging is enabled, get the value of the SpeechServiceConnection_EnableAudioLogging property:

var SpeechSDK;
SpeechSDK = speechSdk;
// <...>
string isAudioLoggingEnabled = speechConfig.getProperty(SpeechSDK.PropertyId.SpeechServiceConnection_EnableAudioLogging);

Each SpeechRecognizer that uses this speechConfig has audio and transcription logging enabled.

To enable audio and transcription logging with the Speech SDK, you execute the method enable_audio_logging of the SpeechConfig class instance.

speech_config.enable_audio_logging()

To check whether logging is enabled, get the value of the SpeechServiceConnection_EnableAudioLogging property:

import azure.cognitiveservices.speech as speechsdk
# <...>
is_audio_logging_enabled = speech_config.get_property(property_id=speechsdk.PropertyId.SpeechServiceConnection_EnableAudioLogging)

Each SpeechRecognizer that uses this speech_config has audio and transcription logging enabled.

To enable audio and transcription logging with the Speech SDK, you execute the method enableAudioLogging of the SPXSpeechConfiguration class instance.

[speechConfig enableAudioLogging];

To check whether logging is enabled, get the value of the SPXSpeechServiceConnectionEnableAudioLogging property:

NSString *isAudioLoggingEnabled = [speechConfig getPropertyById:SPXSpeechServiceConnectionEnableAudioLogging];

Each SpeechRecognizer that uses this speechConfig has audio and transcription logging enabled.

Enable logging for speech translation with the Speech SDK

For speech translation, only the audio and transcription of the original audio are logged. The translations aren't logged.

To enable audio and transcription logging with the Speech SDK, you execute the method EnableAudioLogging() of the SpeechTranslationConfig class instance.

speechTranslationConfig.EnableAudioLogging();

To check whether logging is enabled, get the value of the SpeechServiceConnection_EnableAudioLogging property:

string isAudioLoggingEnabled = speechTranslationConfig.GetProperty(PropertyId.SpeechServiceConnection_EnableAudioLogging);

Each TranslationRecognizer that uses this speechTranslationConfig has audio and transcription logging enabled.

To enable audio and transcription logging with the Speech SDK, you execute the method EnableAudioLogging of the SpeechTranslationConfig class instance.

speechTranslationConfig->EnableAudioLogging();

To check whether logging is enabled, get the value of the SpeechServiceConnection_EnableAudioLogging property:

string isAudioLoggingEnabled = speechTranslationConfig->GetProperty(PropertyId::SpeechServiceConnection_EnableAudioLogging);

Each TranslationRecognizer that uses this speechTranslationConfig has audio and transcription logging enabled.

To enable audio and transcription logging with the Speech SDK, you execute the method enableAudioLogging() of the SpeechTranslationConfig class instance.

speechTranslationConfig.enableAudioLogging();

To check whether logging is enabled, get the value of the SpeechServiceConnection_EnableAudioLogging property:

String isAudioLoggingEnabled = speechTranslationConfig.getProperty(PropertyId.SpeechServiceConnection_EnableAudioLogging);

Each TranslationRecognizer that uses this speechTranslationConfig has audio and transcription logging enabled.

To enable audio and transcription logging with the Speech SDK, you execute the method enableAudioLogging() of the SpeechTranslationConfig class instance.

speechTranslationConfig.enableAudioLogging();

To check whether logging is enabled, get the value of the SpeechServiceConnection_EnableAudioLogging property:

var SpeechSDK;
SpeechSDK = speechSdk;
// <...>
string isAudioLoggingEnabled = speechTranslationConfig.getProperty(SpeechSDK.PropertyId.SpeechServiceConnection_EnableAudioLogging);

Each TranslationRecognizer that uses this speechTranslationConfig has audio and transcription logging enabled.

To enable audio and transcription logging with the Speech SDK, you execute the method enable_audio_logging of the SpeechTranslationConfig class instance.

speech_translation_config.enable_audio_logging()

To check whether logging is enabled, get the value of the SpeechServiceConnection_EnableAudioLogging property:

import azure.cognitiveservices.speech as speechsdk
# <...>
is_audio_logging_enabled = speech_translation_config.get_property(property_id=speechsdk.PropertyId.SpeechServiceConnection_EnableAudioLogging)

Each TranslationRecognizer that uses this speech_translation_config has audio and transcription logging enabled.

To enable audio and transcription logging with the Speech SDK, you execute the method enableAudioLogging of the SPXSpeechTranslationConfiguration class instance.

[speechTranslationConfig enableAudioLogging];

To check whether logging is enabled, get the value of the SPXSpeechServiceConnectionEnableAudioLogging property:

NSString *isAudioLoggingEnabled = [speechTranslationConfig getPropertyById:SPXSpeechServiceConnectionEnableAudioLogging];

Each TranslationRecognizer that uses this speechTranslationConfig has audio and transcription logging enabled.

Enable logging for Speech to text REST API for short audio

If you use Speech to text REST API for short audio and want to enable audio and transcription logging, you need to use the query parameter and value storeAudio=true as a part of your REST request. A sample request looks like this:

https://eastus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US&storeAudio=true

Enable audio and transcription logging for a custom model endpoint

This method is applicable for custom speech endpoints only.

Logging can be enabled or disabled in the persistent custom model endpoint settings. When logging is enabled (turned on) for a custom model endpoint, then you don't need to enable logging at the recognition session level with the SDK or REST API. Even when logging isn't enabled for a custom model endpoint, you can enable logging temporarily at the recognition session level with the SDK or REST API.

Warning

You can enable audio and transcription logging for a custom model endpoint:

When you create the endpoint using the Speech Studio, REST API, or Speech CLI. For details about how to enable logging for a custom speech endpoint, see Deploy a custom speech model.
When you update the endpoint (Endpoints_Update) using the Speech to text REST API. For an example of how to update the logging setting for an endpoint, see Turn off logging for a custom model endpoint. But instead of setting the contentLoggingEnabled property to false, set it to true to enable logging for the endpoint.

Turn off logging for a custom model endpoint

To disable audio and transcription logging for a custom model endpoint, you must update the persistent endpoint logging setting using the Speech to text REST API. There isn't a way to disable logging for an existing custom model endpoint using the Speech Studio.

To turn off logging for a custom endpoint, use the Endpoints_Update operation of the Speech to text REST API. Construct the request body according to the following instructions:

Set the contentLoggingEnabled property within properties. Set this property to true to enable logging of the endpoint's traffic. Set this property to false to disable logging of the endpoint's traffic.

Make an HTTP PATCH request using the URI as shown in the following example. Replace YourSubscriptionKey with your Speech resource key, replace YourServiceRegion with your Speech resource region, replace YourEndpointId with your endpoint ID, and set the request body properties as previously described.

curl -v -X PATCH -H "Ocp-Apim-Subscription-Key: YourSubscriptionKey" -H "Content-Type: application/json" -d '{
  "properties": {
    "contentLoggingEnabled": false
  },
}'  "https://YourServiceRegion.api.cognitive.microsoft.com/speechtotext/v3.2/endpoints/YourEndpointId"

You should receive a response body in the following format:

{
  "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/endpoints/a07164e8-22d1-4eb7-aa31-bf6bb1097f37",
  "model": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/models/9e240dc1-3d2d-4ac9-98ec-1be05ba0e9dd"
  },
  "links": {
    "logs": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/endpoints/a07164e8-22d1-4eb7-aa31-bf6bb1097f37/files/logs",
    "restInteractive": "https://eastus.stt.speech.microsoft.com/speech/recognition/interactive/cognitiveservices/v1?cid=a07164e8-22d1-4eb7-aa31-bf6bb1097f37",
    "restConversation": "https://eastus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?cid=a07164e8-22d1-4eb7-aa31-bf6bb1097f37",
    "restDictation": "https://eastus.stt.speech.microsoft.com/speech/recognition/dictation/cognitiveservices/v1?cid=a07164e8-22d1-4eb7-aa31-bf6bb1097f37",
    "webSocketInteractive": "wss://eastus.stt.speech.microsoft.com/speech/recognition/interactive/cognitiveservices/v1?cid=a07164e8-22d1-4eb7-aa31-bf6bb1097f37",
    "webSocketConversation": "wss://eastus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?cid=a07164e8-22d1-4eb7-aa31-bf6bb1097f37",
    "webSocketDictation": "wss://eastus.stt.speech.microsoft.com/speech/recognition/dictation/cognitiveservices/v1?cid=a07164e8-22d1-4eb7-aa31-bf6bb1097f37"
  },
  "project": {
    "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/projects/0198f569-cc11-4099-a0e8-9d55bc3d0c52"
  },
  "properties": {
    "loggingEnabled": false
  },
  "lastActionDateTime": "2024-07-15T16:30:12Z",
  "status": "Succeeded",
  "createdDateTime": "2024-07-15T16:29:36Z",
  "locale": "en-US",
  "displayName": "My Endpoint",
  "description": "My Endpoint Description"
}

The response body should reflect the new setting. The name of the logging property in the response (loggingEnabled) is different from the name of the logging property that you set in the request (contentLoggingEnabled).

Get audio and transcription logs

You can access audio and transcription logs using Speech to text REST API. For custom model endpoints, you can also use Speech Studio. See details in the following sections.

Note

Logging data is kept for 30 days. After this period the logs are automatically deleted. However you can delete specific logs or a range of available logs at any time.

Get audio and transcription logs with Speech Studio

This method is applicable for custom model endpoints only.

To download the endpoint logs:

Sign in to the Speech Studio.
Select Custom speech > Your project name > Deploy models.
Select the link by endpoint name.
Under Content logging, select Download log.

With this approach, you can download all available log sets at once. There's no way to download selected log sets in Speech Studio.

Get audio and transcription logs with Speech to text REST API

You can download all or a subset of available log sets.

This method is applicable for base and custom model endpoints. To list and download audio and transcription logs:

Base models: Use the Endpoints_ListBaseModelLogs operation of the Speech to text REST API. This operation gets the list of audio and transcription logs that are stored when using the default base model of a given language.
Custom model endpoints: Use the Endpoints_ListLogs operation of the Speech to text REST API. This operation gets the list of audio and transcription logs that are stored for a given endpoint.

Get log IDs with Speech to text REST API

In some scenarios, you might need to get IDs of the available logs. For example, you might want to delete a specific log as described later in this article.

To get IDs of the available logs:

Base models: Use the Endpoints_ListBaseModelLogs operation of the Speech to text REST API. This operation gets the list of audio and transcription logs that are stored when using the default base model of a given language.
Custom model endpoints: Use the Endpoints_ListLogs operation of the Speech to text REST API. This operation gets the list of audio and transcription logs that are stored for a given endpoint.

Here's a sample output of Endpoints_ListLogs. For simplicity, only one log set is shown:

{
  "values": [
    {
      "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/endpoints/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/files/logs/2023-03-13_163715__0420c53d-e6ac-4857-bce0-f39c3f9f5ff9_v2_json",
      "name": "163715__0420c53d-e6ac-4857-bce0-f39c3f9f5ff9.v2.json",
      "kind": "Transcription",
      "properties": {
        "size": 79920
      },
      "createdDateTime": "2024-07-15T16:29:36Z",
      "links": {
        "contentUrl": "<Link to download log file>"
      }
    },
    {
      "self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/endpoints/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/files/logs/2023-03-13_163715__0420c53d-e6ac-4857-bce0-f39c3f9f5ff9_wav",
      "name": "163715__0420c53d-e6ac-4857-bce0-f39c3f9f5ff9.wav",
      "kind": "Audio",
      "properties": {
        "size": 932966
      },
      "createdDateTime": "2024-07-15T16:29:36Z",
      "links": {
        "contentUrl": "<Link to download log file>"
      }
    }
  ]
}

The locations of each audio and transcription log file are returned in the response body. See the corresponding kind property to determine whether the file includes the audio ("kind": "Audio") or the transcription ("kind": "Transcription").

The log ID for each log file is the last part of the URL in the "self" element value. The log ID in the following example is 2023-03-13_163715__0420c53d-e6ac-4857-bce0-f39c3f9f5ff9_v2_json.

"self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/endpoints/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/files/logs/2023-03-13_163715__0420c53d-e6ac-4857-bce0-f39c3f9f5ff9_v2_json"

Delete audio and transcription logs

Logging data is kept for 30 days. After this period, the logs are automatically deleted. However you can delete specific logs or a range of available logs at any time.

For any base or custom model endpoint you can delete all available logs, logs for a given time frame, or a particular log based on its Log ID. The deletion process is done asynchronously and can take minutes, hours, one day, or longer depending on the number of log files.

To delete audio and transcription logs you must use the Speech to text REST API. There isn't a way to delete logs using the Speech Studio.

Delete all logs or logs for a given time frame

To delete all logs or logs for a given time frame:

Base models: Use the Endpoints_DeleteBaseModelLogs operation of the Speech to text REST API.
Custom model endpoints: Use the Endpoints_DeleteLogs operation of the Speech to text REST API.

Optionally, set the endDate of the audio logs deletion (specific day, UTC). Expected format: "yyyy-mm-dd". For instance, "2023-03-15" results in deleting all logs on March 15, 2023 and before.

Delete specific log

To delete a specific log by ID:

Base models: Use the Endpoints_DeleteBaseModelLog operation of the Speech to text REST API.
Custom model endpoints: Use the Endpoints_DeleteLog operation of the Speech to text REST API.

For details about how to get Log IDs, see a previous section Get log IDs with Speech to text REST API.

Since audio and transcription logs have separate IDs (such as IDs 2023-03-13_163715__0420c53d-e6ac-4857-bce0-f39c3f9f5ff9_v2_json and 2023-03-13_163715__0420c53d-e6ac-4857-bce0-f39c3f9f5ff9_wav from a previous example in this article), when you want to delete both audio and transcription logs you execute separate delete by ID requests.

Споделяне чрез

How to log audio and transcriptions for speech recognition

Enable audio and transcription logging

Enable logging for a single recognition session

Enable logging for speech to text with the Speech SDK

Enable logging for speech translation with the Speech SDK

Enable logging for Speech to text REST API for short audio

Enable audio and transcription logging for a custom model endpoint

Turn off logging for a custom model endpoint

Get audio and transcription logs

Get audio and transcription logs with Speech Studio

Get audio and transcription logs with Speech to text REST API

Get log IDs with Speech to text REST API

Delete audio and transcription logs

Delete all logs or logs for a given time frame

Delete specific log

Next steps

Обратна връзка

Допълнителни ресурси