Migrate code from v3.1 to v3.2 of the REST API

The Speech to text REST API is used for Batch transcription and custom speech. This article describes changes from version 3.1 to 3.2.

Important

Speech to text REST API v3.2 is the latest version that's generally available. Preview versions 3.2-preview.1 and 3.2-preview.2* will be removed in September 2024. Speech to text REST API v3.1 will be retired on a date to be announced. Speech to text REST API v3.0 will be retired on April 1st, 2026.

Base path

You must update the base path in your code from /speechtotext/v3.1 to /speechtotext/v3.2. For example, to get base models in the eastus region, use https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/models/base instead of https://eastus.api.cognitive.microsoft.com/speechtotext/v3.1/models/base.

For more information, see Operation IDs later in this guide.

Batch transcription

Important

New pricing is in effect for batch transcription via Speech to text REST API v3.2. For more information, see the pricing guide.

Backwards compatibility limitations

Don't use Speech to text REST API v3.0 or v3.1 to retrieve a transcription created via Speech to text REST API v3.2. You might see an error message such as: "The API version can't be used to access this transcription. Use API version v3.2 or higher."

Language identification mode

The LanguageIdentificationMode is added to LanguageIdentificationProperties as sibling of candidateLocales and speechModelMapping. The modes available for language identification are Continuous or Single. Continuous language identification is the default. For more information, see Language identification.

Whisper models

Azure AI Speech now supports OpenAI's Whisper model via Speech to text REST API v3.2. To learn more, check out the Create a batch transcription guide.

Note

Azure OpenAI Service also supports OpenAI's Whisper model for speech to text with a synchronous REST API. To learn more, check out the quickstart. Check out What is the Whisper model? to learn more about when to use Azure AI Speech vs. Azure OpenAI Service.

Custom speech

Important

You'll be charged for custom speech model training if the base model was created on October 1, 2023 and later. You are not charged for training if the base model was created prior to October 2023. For more information, see Azure AI Speech pricing.

To programmatically determine whether a model was created before or after October 1, 2023, use the chargedForAdaptation property that's new in version 3.2.

Custom display text formatting

To support model adaptation with custom display text formatting data, the Datasets_Create operation supports the OutputFormatting data kind. For more information, see upload datasets.

Added a definition for OutputFormatType with Lexical and Display enum values.

"OutputFormatType": {
    "title": "OutputFormatType",
    "enum": [
        "Lexical",
        "Display"
    ],
    "type": "string",
    "x-ms-enum": {
        "name": "OutputFormatType",
        "modelAsString": true,
        "values": [
            {
                "value": "Lexical",
                "description": "Model provides the transcription output without formatting."
            },
            {
                "value": "Display",
                "description": "Model supports display formatting transcriptions output or endpoints."
            }
        ]
    }
},

The OutputFormattingData enum value is added to FileKind (type of input data).

The supportedOutputFormat property is added to BaseModelFeatures. This property is within the BaseModel definition.

"BaseModelFeatures": {
    "title": "BaseModelFeatures",
    "description": "Features supported by the model.",
    "type": "object",
    "allOf": [
        {
            "$ref": "#/definitions/SharedModelFeatures"
        }
    ],
    "properties": {
        "supportsAdaptationsWith": {
            "description": "Supported dataset kinds to adapt the model.",
            "type": "array",
            "items": {
                "$ref": "#/definitions/DatasetKind"
            },
            "readOnly": true
        },
        "supportedOutputFormat": {
            "description": "Supported output formats.",
            "type": "array",
            "items": {
                "$ref": "#/definitions/OutputFormatType"
            },
            "readOnly": true
        }
    }
},

Charge for adaptation

The chargeForAdaptation property is added to BaseModelProperties. This property is within the BaseModel definition.

Important

You'll be charged for custom speech model training if the base model was created on October 1, 2023 and later. You are not charged for training if the base model was created prior to October 2023. For more information, see Azure AI Speech pricing.

If the value of chargeForAdaptation is true, you're charged for training the model. If the value is false, you're charged for training the model. Use the chargeForAdaptation property instead of the created date to programmatically determine whether you're charged for training a model.

"BaseModelProperties": {
    "title": "BaseModelProperties",
    "type": "object",
    "properties": {
        "deprecationDates": {
            "$ref": "#/definitions/BaseModelDeprecationDates"
        },
        "features": {
            "$ref": "#/definitions/BaseModelFeatures"
        },
        "chargeForAdaptation": {
            "description": "A value indicating whether model adaptation is charged.",
            "type": "boolean",
            "readOnly": true
        }
    }
},

Text normalization

The textNormalizationKind property is added to DatasetProperties.

Entity definition for TextNormalizationKind: The kind of text normalization.

  • Default: Default text normalization (for example, 'two to three' replaces '2 to 3' in en-US).
  • None: No text normalization is applied to the input text. This value is an override option that should only be used when text is normalized before the upload.

Evaluation properties

Added token count and token error properties to the EvaluationProperties properties:

  • correctTokenCount1: The number of correctly recognized tokens by model1.
  • tokenCount1: The number of processed tokens by model1.
  • tokenDeletionCount1: The number of recognized tokens by model1 that are deletions.
  • tokenErrorRate1: The token error rate of recognition with model1.
  • tokenInsertionCount1: The number of recognized tokens by model1 that are insertions.
  • tokenSubstitutionCount1: The number of recognized words by model1 that are substitutions.
  • correctTokenCount2: The number of correctly recognized tokens by model2.
  • tokenCount2: The number of processed tokens by model2.
  • tokenDeletionCount2: The number of recognized tokens by model2 that are deletions.
  • tokenErrorRate2: The token error rate of recognition with model2.
  • tokenInsertionCount2: The number of recognized tokens by model2 that are insertions.
  • tokenSubstitutionCount2: The number of recognized words by model2 that are substitutions.

Model copy

The following changes are for the scenario where you copy a model.

  • Added the new Models_Copy operation. Here's the schema in the new copy operation: "$ref": "#/definitions/ModelCopyAuthorization"
  • Deprecated the Models_CopyTo operation. Here's the schema in the deprecated copy operation: "$ref": "#/definitions/ModelCopy"
  • Added the new Models_AuthorizeCopy operation that returns "$ref": "#/definitions/ModelCopyAuthorization". This returned entity can be used in the new Models_Copy operation.

Added a new entity definition for ModelCopyAuthorization:

"ModelCopyAuthorization": {
    "title": "ModelCopyAuthorization",
    "required": [
        "expirationDateTime",
        "id",
        "sourceResourceId",
        "targetResourceEndpoint",
        "targetResourceId",
        "targetResourceRegion"
    ],
    "type": "object",
    "properties": {
        "targetResourceRegion": {
            "description": "The region (aka location) of the target speech resource (e.g., westus2).",
            "minLength": 1,
            "type": "string"
        },
        "targetResourceId": {
            "description": "The Azure Resource ID of the target speech resource.",
            "minLength": 1,
            "type": "string"
        },
        "targetResourceEndpoint": {
            "description": "The endpoint (base url) of the target resource (with custom domain name when it is used).",
            "minLength": 1,
            "type": "string"
        },
        "sourceResourceId": {
            "description": "The Azure Resource ID of the source speech resource.",
            "minLength": 1,
            "type": "string"
        },
        "expirationDateTime": {
            "format": "date-time",
            "description": "The expiration date of this copy authorization.",
            "type": "string"
        },
        "id": {
            "description": "The ID of this copy authorization.",
            "minLength": 1,
            "type": "string"
        }
    }
},

Added a new entity definition for ModelCopyAuthorizationDefinition:

"ModelCopyAuthorizationDefinition": {
    "title": "ModelCopyAuthorizationDefinition",
    "required": [
        "sourceResourceId"
    ],
    "type": "object",
    "properties": {
        "sourceResourceId": {
            "description": "The Azure Resource ID of the source speech resource.",
            "minLength": 1,
            "type": "string"
        }
    }
},

Added a new copy property.

  • copyTo URI: The location of the obsolete model copy action. See the Models_CopyTo operation for more details.
  • copy URI: The location of the model copy action. See the Models_Copy operation for more details.
"CustomModelLinks": {
    "title": "CustomModelLinks",
    "type": "object",
    "properties": {
      "copyTo": {
        "format": "uri",
        "description": "The location to the obsolete model copy action. See operation \"Models_CopyTo\" for more details.",
        "type": "string",
        "readOnly": true
      },
      "copy": {
        "format": "uri",
        "description": "The location to the model copy action. See operation \"Models_Copy\" for more details.",
        "type": "string",
        "readOnly": true
      },
      "files": {
        "format": "uri",
        "description": "The location to get all files of this entity. See operation \"Models_ListFiles\" for more details.",
        "type": "string",
        "readOnly": true
      },
      "manifest": {
        "format": "uri",
        "description": "The location to get a manifest for this model to be used in the on-prem container. See operation \"Models_GetCustomModelManifest\" for more details.",
        "type": "string",
        "readOnly": true
      }
    },
    "readOnly": true
},

Operation IDs

You must update the base path in your code from /speechtotext/v3.1 to /speechtotext/v3.2. For example, to get base models in the eastus region, use https://eastus.api.cognitive.microsoft.com/speechtotext/v3.2/models/base instead of https://eastus.api.cognitive.microsoft.com/speechtotext/v3.1/models/base.

Next steps