Azure Batch Speech-to-text
Accurately transcribe audio to text in more than 100 languages and variants. As part of Azure AI Speech service, Batch Transcription enables you to transcribe a large amount of audio in storage. You can point to audio files with a shared access signature (SAS) URI and asynchronously receive transcription results.
This connector is available in the following products and regions:
Service | Class | Regions |
---|---|---|
Logic Apps | Standard | All Logic Apps regions except the following: - Azure China regions |
Power Automate | Standard | All Power Automate regions except the following: - China Cloud operated by 21Vianet |
Power Apps | Standard | All Power Apps regions except the following: - China Cloud operated by 21Vianet |
Contact | |
---|---|
Name | Speech Service Power Platform Team |
URL | https://docs.microsoft.com/azure/cognitive-services/speech-service/support |
speechpowerplatform@microsoft.com |
Connector Metadata | |
---|---|
Publisher | Microsoft |
Website | https://docs.microsoft.com/azure/cognitive-services/speech-service/ |
Privacy policy | https://privacy.microsoft.com |
Categories | AI;Website |
The Speech Services batch transcription API is a cloud-based service that provides batch speech recognition asynchronous processing over provided audio contents. This connector exposes these functions as operations in Microsoft Power Automate and Power Apps.
Pre-requisites
You will need the following to proceed:
- Azure subscription - Create one for free
- Create a Speech resource in the Azure portal.
- Get the Speech resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.
- Upload your own data or use existing audio files via public URI or shared access signature (SAS) URI. Learn more here
Creating a connection
The connector supports the following authentication types:
Api Key | ApiKey | All regions | Shareable |
Azure AD Integrated | Use Azure Active Directory to access your speech service. | US Government (GCC) only | Not shareable |
Azure AD Integrated (Azure Government) | Use Azure Active Directory to access your speech service. | Azure Government and Department of Defense (DoD) in Azure Government and US Government (GCC-High) only | Not shareable |
Microsoft Entra ID Integrated | Use Microsoft Entra ID to access your speech service. | All regions except Azure Government and Department of Defense (DoD) in Azure Government and US Government (GCC) and US Government (GCC-High) | Not shareable |
Default [DEPRECATED] | This option is only for older connections without an explicit authentication type, and is only provided for backward compatibility. | All regions | Not shareable |
Api Key
Auth ID: keyBasedAuth
Applicable: All regions
ApiKey
This is shareable connection. If the power app is shared with another user, connection is shared as well. For more information, please see the Connectors overview for canvas apps - Power Apps | Microsoft Docs
Name | Type | Description | Required |
---|---|---|---|
Account Key | securestring | Speech service key | True |
Region | string | Speech service region (Example: eastus) | True |
Azure AD Integrated
Auth ID: tokenBasedAuth
Applicable: US Government (GCC) only
Use Azure Active Directory to access your speech service.
This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.
Name | Type | Description | Required |
---|---|---|---|
Custom Subdomain | string | Custom subdomain endpoint url (Example: contoso) | True |
Azure AD Integrated (Azure Government)
Auth ID: tokenBasedAuth
Applicable: Azure Government and Department of Defense (DoD) in Azure Government and US Government (GCC-High) only
Use Azure Active Directory to access your speech service.
This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.
Name | Type | Description | Required |
---|---|---|---|
Custom Subdomain | string | Custom subdomain endpoint url (Example: contoso) | True |
Microsoft Entra ID Integrated
Auth ID: tokenBasedAuth
Applicable: All regions except Azure Government and Department of Defense (DoD) in Azure Government and US Government (GCC) and US Government (GCC-High)
Use Microsoft Entra ID to access your speech service.
This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.
Name | Type | Description | Required |
---|---|---|---|
Custom Subdomain | string | Custom subdomain endpoint url (Example: contoso) | True |
Default [DEPRECATED]
Applicable: All regions
This option is only for older connections without an explicit authentication type, and is only provided for backward compatibility.
This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.
Name | Type | Description | Required |
---|---|---|---|
Account Key | securestring | Azure Cognitive Services for Batch Speech-to-text Account Key | True |
Region | string | Speech service region (Example: eastus) | True |
Throttling Limits
Name | Calls | Renewal Period |
---|---|---|
API calls per connection | 100 | 60 seconds |
Actions
Create transcription (V3.1) |
Creates a new transcription. |
Delete transcription (V3.1) |
Deletes the specified transcription task. |
Get supported locales (V3.1) |
Gets a list of supported locales for offline transcriptions. |
Get transcription file (V3.1) |
Gets one specific file (identified with fileId) from a transcription (identified with id). |
Get transcriptions (V3.1) |
Gets the transcription identified by the given ID. |
Get transcriptions list (V3.1) |
Gets a list of transcriptions for the authenticated subscription. |
Get transcriptions list files (V3.1) |
Gets the files of the transcription identified by the given ID. |
Update transcription (V3.1) |
Updates the mutable details of the transcription identified by its ID. |
Create transcription (V3.1)
Creates a new transcription.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
contentUrls
|
contentUrls | array of uri |
You can provide a list of content urls to get audio files to transcribe. Up to 1000 urls are allowed.This property will not be returned in a response. |
|
contentContainerUrl
|
contentContainerUrl | uri |
Alternatively, you can provide a URL for an Azure blob container that contains the audio files. A container is allowed to have a maximum size of 5GB and a maximum number of 10000 blobs.The maximum size for a blob is 2.5GB.Container SAS should contain 'r' (read) and 'l' (list) permissions.This property will not be returned in a response. |
|
locale
|
locale | True | string |
The locale of the contained data. If Language Identification is used, this locale is used to transcribe speech for which no language could be detected. |
displayName
|
displayName | True | string |
The display name of the object. |
model
|
self | uri |
The location of the referenced entity. |
|
diarizationEnabled
|
diarizationEnabled | boolean |
A value indicating whether diarization (speaker identification) is requested. The default valueis |
|
wordLevelTimestampsEnabled
|
wordLevelTimestampsEnabled | boolean |
A value indicating whether word level timestamps are requested. The default value is |
|
displayFormWordLevelTimestampsEnabled
|
displayFormWordLevelTimestampsEnabled | boolean |
A value indicating whether word level timestamps for the display form are requested. The default value is |
|
channels
|
channels | array of integer |
A collection of the requested channel numbers.In the default case, the channels 0 and 1 are considered. |
|
destinationContainerUrl
|
destinationContainerUrl | uri |
The requested destination container.### Remarks ###When a destination container is used in combination with a |
|
punctuationMode
|
punctuationMode | string |
The mode used for punctuation. |
|
profanityFilterMode
|
profanityFilterMode | string |
Mode of profanity filtering. |
|
timeToLive
|
timeToLive | string |
How long the transcription will be kept in the system after it has completed. Once thetranscription reaches the time to live after completion (successful or failed) it will be automaticallydeleted. Not setting this value or setting it to 0 will disable automatic deletion. The longest supportedduration is 31 days.The duration is encoded as ISO 8601 duration ("PnYnMnDTnHnMnS", see https://en.wikipedia.org/wiki/ISO_8601#Durations). |
|
minCount
|
minCount | integer |
A hint for the minimum number of speakers for diarization. Must be smaller than or equal to the maxSpeakers property. |
|
maxCount
|
maxCount | integer |
The maximum number of speakers for diarization. Must be less than 36 and larger than or equal to the minSpeakers property. |
|
candidateLocales
|
candidateLocales | True | array of string |
The candidate locales for language identification (example ["en-US", "de-DE", "es-ES"]). A minimum of 2 and a maximum of 10 candidate locales, including the main locale for the transcription, is supported. |
speechModelMapping
|
speechModelMapping | object |
An optional mapping of locales to speech model entities. If no model is given for a locale, the default base model is used.Keys must be locales contained in the candidate locales, values are entities for models of the respective locales. |
|
email
|
string |
The email address to send email notifications to in case the operation completes.The value will be removed after successfully sending the email. |
Returns
- Body
- Transcription
Delete transcription (V3.1)
Deletes the specified transcription task.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Id
|
id | True | uuid |
The identifier of the transcription. |
Get supported locales (V3.1)
Gets a list of supported locales for offline transcriptions.
Returns
Name | Path | Type | Description |
---|---|---|---|
|
array of string |
Get transcription file (V3.1)
Gets one specific file (identified with fileId) from a transcription (identified with id).
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Id
|
id | True | uuid |
The identifier of the transcription. |
File Id
|
fileId | True | uuid |
The identifier of the file. |
Sas Validity In Seconds
|
sasValidityInSeconds | integer |
The duration in seconds that an SAS url should be valid. The default duration is 12 hours. When using BYOS (https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-encryption-of-data-at-rest#bring-your-own-storage-byos-for-customization-and-logging): A value of 0 means that a plain blob URI without SAS token will be generated. |
Returns
- Body
- File
Get transcriptions (V3.1)
Gets the transcription identified by the given ID.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Id
|
id | True | uuid |
The identifier of the transcription. |
Returns
- Body
- Transcription
Get transcriptions list (V3.1)
Gets a list of transcriptions for the authenticated subscription.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Skip
|
skip | integer |
Number of datasets that will be skipped. |
|
Top
|
top | integer |
Number of datasets that will be included after skipping. |
|
Filter
|
filter | string |
A filtering expression for selecting a subset of the available transcriptions.
|
Returns
Get transcriptions list files (V3.1)
Gets the files of the transcription identified by the given ID.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Id
|
id | True | uuid |
The identifier of the transcription. |
Sas Validity In Seconds
|
sasValidityInSeconds | integer |
The duration in seconds that an SAS url should be valid. The default duration is 12 hours. When using BYOS (https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-encryption-of-data-at-rest#bring-your-own-storage-byos-for-customization-and-logging): A value of 0 means that a plain blob URI without SAS token will be generated. |
|
Skip
|
skip | integer |
Number of datasets that will be skipped. |
|
Top
|
top | integer |
Number of datasets that will be included after skipping. |
|
Filter
|
filter | string |
A filtering expression for selecting a subset of the available files.
|
Returns
- Body
- PaginatedFiles
Update transcription (V3.1)
Updates the mutable details of the transcription identified by its ID.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Id
|
id | True | uuid |
The identifier of the transcription. |
self
|
self | True | uri |
The location of the referenced entity. |
displayName
|
displayName | string |
The name of the object. |
|
description
|
description | string |
The description of the object. |
|
customProperties
|
customProperties | object |
The custom properties of this entity. The maximum allowed key length is 64 characters, the maximumallowed value length is 256 characters and the count of allowed entries is 10. |
Returns
- Body
- Transcription
Definitions
DiarizationProperties
Name | Path | Type | Description |
---|---|---|---|
speakers
|
speakers | DiarizationSpeakersProperties |
DiarizationSpeakersProperties
Name | Path | Type | Description |
---|---|---|---|
minCount
|
minCount | integer |
A hint for the minimum number of speakers for diarization. Must be smaller than or equal to the maxSpeakers property. |
maxCount
|
maxCount | integer |
The maximum number of speakers for diarization. Must be less than 36 and larger than or equal to the minSpeakers property. |
File
Name | Path | Type | Description |
---|---|---|---|
kind
|
kind | FileKind |
Type of data. |
links
|
links | FileLinks | |
createdDateTime
|
createdDateTime | date-time |
The creation time of this file.The time stamp is encoded as ISO 8601 date and time format(see https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations). |
properties
|
properties | FileProperties | |
name
|
name | string |
The name of this file. |
FileKind
FileLinks
Name | Path | Type | Description |
---|---|---|---|
contentUrl
|
contentUrl | uri |
The url to retrieve the content of this file. |
FileProperties
Name | Path | Type | Description |
---|---|---|---|
size
|
size | integer |
The size of the data in bytes. |
duration
|
duration | string |
The duration in case this file is an audio file. The duration is encoded as ISO 8601duration ("PnYnMnDTnHnMnS", see https://en.wikipedia.org/wiki/ISO_8601#Durations). |
LanguageIdentificationProperties
Name | Path | Type | Description |
---|---|---|---|
candidateLocales
|
candidateLocales | array of string |
The candidate locales for language identification (example ["en-US", "de-DE", "es-ES"]). A minimum of 2 and a maximum of 10 candidate locales, including the main locale for the transcription, is supported. |
speechModelMapping
|
speechModelMapping | object |
An optional mapping of locales to speech model entities. If no model is given for a locale, the default base model is used.Keys must be locales contained in the candidate locales, values are entities for models of the respective locales. |
PaginatedFiles
Name | Path | Type | Description |
---|---|---|---|
values
|
values | array of File |
A list of entities limited by either the passed query parameters 'skip' and 'top' or their default values. When iterating through a list using pagination and deleting entities in parallel, some entities will be skipped in the results.It's recommended to build a list on the client and delete after the fetching of the complete list. |
@nextLink
|
@nextLink | uri |
A link to the next set of paginated results if there are more entities available; otherwise null. |
PaginatedTranscriptions
Name | Path | Type | Description |
---|---|---|---|
values
|
values | array of Transcription |
A list of entities limited by either the passed query parameters 'skip' and 'top' or their default values. When iterating through a list using pagination and deleting entities in parallel, some entities will be skipped in the results.It's recommended to build a list on the client and delete after the fetching of the complete list. |
@nextLink
|
@nextLink | uri |
A link to the next set of paginated results if there are more entities available; otherwise null. |
ProfanityFilterMode
PunctuationMode
Transcription
Name | Path | Type | Description |
---|---|---|---|
contentUrls
|
contentUrls | array of uri |
You can provide a list of content urls to get audio files to transcribe. Up to 1000 urls are allowed.This property will not be returned in a response. |
contentContainerUrl
|
contentContainerUrl | uri |
Alternatively, you can provide a URL for an Azure blob container that contains the audio files. A container is allowed to have a maximum size of 5GB and a maximum number of 10000 blobs.The maximum size for a blob is 2.5GB.Container SAS should contain 'r' (read) and 'l' (list) permissions.This property will not be returned in a response. |
locale
|
locale | string |
The locale of the contained data. If Language Identification is used, this locale is used to transcribe speech for which no language could be detected. |
displayName
|
displayName | string |
The display name of the object. |
model
|
model.self | uri |
The location of the referenced entity. |
properties
|
properties | TranscriptionProperties |
TranscriptionProperties
Name | Path | Type | Description |
---|---|---|---|
diarizationEnabled
|
diarizationEnabled | boolean |
A value indicating whether diarization (speaker identification) is requested. The default valueis |
wordLevelTimestampsEnabled
|
wordLevelTimestampsEnabled | boolean |
A value indicating whether word level timestamps are requested. The default value is |
displayFormWordLevelTimestampsEnabled
|
displayFormWordLevelTimestampsEnabled | boolean |
A value indicating whether word level timestamps for the display form are requested. The default value is |
channels
|
channels | array of integer |
A collection of the requested channel numbers.In the default case, the channels 0 and 1 are considered. |
destinationContainerUrl
|
destinationContainerUrl | uri |
The requested destination container.### Remarks ###When a destination container is used in combination with a |
punctuationMode
|
punctuationMode | PunctuationMode |
The mode used for punctuation. |
profanityFilterMode
|
profanityFilterMode | ProfanityFilterMode |
Mode of profanity filtering. |
timeToLive
|
timeToLive | string |
How long the transcription will be kept in the system after it has completed. Once thetranscription reaches the time to live after completion (successful or failed) it will be automaticallydeleted. Not setting this value or setting it to 0 will disable automatic deletion. The longest supportedduration is 31 days.The duration is encoded as ISO 8601 duration ("PnYnMnDTnHnMnS", see https://en.wikipedia.org/wiki/ISO_8601#Durations). |
diarization
|
diarization | DiarizationProperties | |
Language Identification -
|
languageIdentification | LanguageIdentificationProperties | |
email
|
string |
The email address to send email notifications to in case the operation completes.The value will be removed after successfully sending the email. |