Azure Batch Speech-to-text

Reference

Accurately transcribe audio to text in more than 100 languages and variants. As part of Azure AI Speech service, Batch Transcription enables you to transcribe a large amount of audio in storage. You can point to audio files with a shared access signature (SAS) URI and asynchronously receive transcription results.

This connector is available in the following products and regions:

Service	Class	Regions
Logic Apps	Standard	All Logic Apps regions except the following: - Azure China regions
Power Automate	Standard	All Power Automate regions except the following: - China Cloud operated by 21Vianet
Power Apps	Standard	All Power Apps regions except the following: - China Cloud operated by 21Vianet

Contact
Name	Speech Service Power Platform Team
URL	https://docs.microsoft.com/azure/cognitive-services/speech-service/support
Email	speechpowerplatform@microsoft.com

Connector Metadata
Publisher	Microsoft
Website	https://docs.microsoft.com/azure/cognitive-services/speech-service/
Privacy policy	https://privacy.microsoft.com
Categories	AI;Website

The Speech Services batch transcription API is a cloud-based service that provides batch speech recognition asynchronous processing over provided audio contents. This connector exposes these functions as operations in Microsoft Power Automate and Power Apps.

Pre-requisites

You will need the following to proceed:

Azure subscription - Create one for free
Create a Speech resource in the Azure portal.
Get the Speech resource key and region. After your Speech resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.
Upload your own data or use existing audio files via public URI or shared access signature (SAS) URI. Learn more here

Creating a connection

The connector supports the following authentication types:


Api Key	ApiKey	All regions	Shareable
Microsoft Entra ID Integrated	Use Microsoft Entra ID to access your speech service.	All regions except Azure Government and Department of Defense (DoD) in Azure Government and US Government (GCC-High)	Not shareable
Microsoft Entra ID Integrated (Azure Government)	Use Microsoft Entra ID to access your speech service.	Azure Government and Department of Defense (DoD) in Azure Government and US Government (GCC-High) only	Not shareable
Default [DEPRECATED]	This option is only for older connections without an explicit authentication type, and is only provided for backward compatibility.	All regions	Not shareable

Api Key

Auth ID: keyBasedAuth

Applicable: All regions

ApiKey

This is shareable connection. If the power app is shared with another user, connection is shared as well. For more information, please see the Connectors overview for canvas apps - Power Apps | Microsoft Docs

Name	Type	Description	Required
Account Key	securestring	Speech service key	True
Region	string	Speech service region (Example: eastus)	True

Microsoft Entra ID Integrated

Auth ID: tokenBasedAuth

Applicable: All regions except Azure Government and Department of Defense (DoD) in Azure Government and US Government (GCC-High)

Use Microsoft Entra ID to access your speech service.

This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.

Name	Type	Description	Required
Custom Subdomain	string	Custom subdomain endpoint url (Example: contoso)	True

Microsoft Entra ID Integrated (Azure Government)

Auth ID: tokenBasedAuth

Applicable: Azure Government and Department of Defense (DoD) in Azure Government and US Government (GCC-High) only

Use Microsoft Entra ID to access your speech service.

This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.

Name	Type	Description	Required
Custom Subdomain	string	Custom subdomain endpoint url (Example: contoso)	True

Default [DEPRECATED]

Applicable: All regions

This option is only for older connections without an explicit authentication type, and is only provided for backward compatibility.

This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.

Name	Type	Description	Required
Account Key	securestring	Azure Cognitive Services for Batch Speech-to-text Account Key	True
Region	string	Speech service region (Example: eastus)	True

Throttling Limits

Name	Calls	Renewal Period
API calls per connection	100	60 seconds

Actions

Create transcription (V3.1)	Creates a new transcription.
Delete transcription (V3.1)	Deletes the specified transcription task.
Get supported locales (V3.1)	Gets a list of supported locales for offline transcriptions.
Get transcription file (V3.1)	Gets one specific file (identified with fileId) from a transcription (identified with id).
Get transcriptions (V3.1)	Gets the transcription identified by the given ID.
Get transcriptions list (V3.1)	Gets a list of transcriptions for the authenticated subscription.
Get transcriptions list files (V3.1)	Gets the files of the transcription identified by the given ID.
Update transcription (V3.1)	Updates the mutable details of the transcription identified by its ID.

Create transcription (V3.1)

Operation ID:: CreateTranscriptions

Creates a new transcription.

Parameters

Name	Key	Required	Type	Description
contentUrls	contentUrls		array of uri	You can provide a list of content urls to get audio files to transcribe. Up to 1000 urls are allowed.This property will not be returned in a response.
contentContainerUrl	contentContainerUrl		uri	Alternatively, you can provide a URL for an Azure blob container that contains the audio files. A container is allowed to have a maximum size of 5GB and a maximum number of 10000 blobs.The maximum size for a blob is 2.5GB.Container SAS should contain 'r' (read) and 'l' (list) permissions.This property will not be returned in a response.
locale	locale	True	string	The locale of the contained data. If Language Identification is used, this locale is used to transcribe speech for which no language could be detected.
displayName	displayName	True	string	The display name of the object.
model	self		uri	The location of the referenced entity.
diarizationEnabled	diarizationEnabled		boolean	A value indicating whether diarization (speaker identification) is requested. The default valueis `false`.If only this field is set to true and the improved diarization system is not enabled by specifying`DiarizationProperties`, a basic diarization system will distinguish between up to two speakers. Noextra charges are applied in this case. The improved diarization system provides diarization for aconfigurable range of speakers. It can be configured in the `DiarizationProperties` field. DEPRECATED: The basic diarization system is deprecated and will be removed along with the`diarizationEnabled` setting in the next major version of the API.
wordLevelTimestampsEnabled	wordLevelTimestampsEnabled		boolean	A value indicating whether word level timestamps are requested. The default value is`false`.
displayFormWordLevelTimestampsEnabled	displayFormWordLevelTimestampsEnabled		boolean	A value indicating whether word level timestamps for the display form are requested. The default value is `false`.
channels	channels		array of integer	A collection of the requested channel numbers.In the default case, the channels 0 and 1 are considered.
destinationContainerUrl	destinationContainerUrl		uri	The requested destination container.### Remarks ###When a destination container is used in combination with a `timeToLive`, the metadata of atranscription will be deleted normally, but the data stored in the destination container, includingtranscription results, will remain untouched, because no delete permissions are required for thiscontainer. To support automatic cleanup, either configure blob lifetimes on the container, or use "Bring your own Storage (BYOS)"instead of `destinationContainerUrl`, where blobs can be cleaned up.
punctuationMode	punctuationMode		string	The mode used for punctuation.
profanityFilterMode	profanityFilterMode		string	Mode of profanity filtering.
timeToLive	timeToLive		string	How long the transcription will be kept in the system after it has completed. Once thetranscription reaches the time to live after completion (successful or failed) it will be automaticallydeleted. Not setting this value or setting it to 0 will disable automatic deletion. The longest supportedduration is 31 days.The duration is encoded as ISO 8601 duration ("PnYnMnDTnHnMnS", see https://en.wikipedia.org/wiki/ISO_8601#Durations).
minCount	minCount		integer	A hint for the minimum number of speakers for diarization. Must be smaller than or equal to the maxSpeakers property.
maxCount	maxCount		integer	The maximum number of speakers for diarization. Must be less than 36 and larger than or equal to the minSpeakers property.
candidateLocales	candidateLocales	True	array of string	The candidate locales for language identification (example ["en-US", "de-DE", "es-ES"]). A minimum of 2 and a maximum of 10 candidate locales, including the main locale for the transcription, is supported.
speechModelMapping	speechModelMapping		object	An optional mapping of locales to speech model entities. If no model is given for a locale, the default base model is used.Keys must be locales contained in the candidate locales, values are entities for models of the respective locales.
email	email		string	The email address to send email notifications to in case the operation completes.The value will be removed after successfully sending the email.

Returns

Body: Transcription

Delete transcription (V3.1)

Operation ID:: DeleteTranscriptions

Deletes the specified transcription task.

Parameters

Name	Key	Required	Type	Description
Id	id	True	uuid	The identifier of the transcription.

Get supported locales (V3.1)

Operation ID:: SupportedTranscriptionLocalesList

Gets a list of supported locales for offline transcriptions.

Returns

Name	Path	Type	Description
		array of string

Get transcription file (V3.1)

Operation ID:: GetTranscriptionsFile

Gets one specific file (identified with fileId) from a transcription (identified with id).

Parameters

Name	Key	Required	Type	Description
Id	id	True	uuid	The identifier of the transcription.
File Id	fileId	True	uuid	The identifier of the file.
Sas Validity In Seconds	sasValidityInSeconds		integer	The duration in seconds that an SAS url should be valid. The default duration is 12 hours. When using BYOS (https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-encryption-of-data-at-rest#bring-your-own-storage-byos-for-customization-and-logging): A value of 0 means that a plain blob URI without SAS token will be generated.

Returns

Body: File

Get transcriptions (V3.1)

Operation ID:: GetTranscriptions

Gets the transcription identified by the given ID.

Parameters

Name	Key	Required	Type	Description
Id	id	True	uuid	The identifier of the transcription.

Returns

Body: Transcription

Get transcriptions list (V3.1)

Operation ID:: TranscriptionsList

Gets a list of transcriptions for the authenticated subscription.

Parameters

Name Key Required Type Description

Name	Key	Type	Description
Skip	skip	integer	Number of datasets that will be skipped.
Top	top	integer	Number of datasets that will be included after skipping.
Filter	filter	string	A filtering expression for selecting a subset of the available transcriptions. Supported properties: displayName, description, createdDateTime, lastActionDateTime, status, locale. Operators: - eq, ne are supported for all properties. - gt, ge, lt, le are supported for createdDateTime and lastActionDateTime. - and, or, not are supported. Example: `filter=createdDateTime gt 2022-02-01T11:00:00Z`

Skip

skip

integer

Number of datasets that will be skipped.

Top

top

integer

Number of datasets that will be included after skipping.

Filter

filter

string

A filtering expression for selecting a subset of the available transcriptions.

Supported properties: displayName, description, createdDateTime, lastActionDateTime, status, locale.
Operators:
- eq, ne are supported for all properties.
- gt, ge, lt, le are supported for createdDateTime and lastActionDateTime.
- and, or, not are supported.
Example: filter=createdDateTime gt 2022-02-01T11:00:00Z

Returns

Body: PaginatedTranscriptions

Get transcriptions list files (V3.1)

Operation ID:: TranscriptionsListFiles

Gets the files of the transcription identified by the given ID.

Parameters

Name	Key	Required	Type	Description
Id	id	True	uuid	The identifier of the transcription.
Sas Validity In Seconds	sasValidityInSeconds		integer	The duration in seconds that an SAS url should be valid. The default duration is 12 hours. When using BYOS (https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-encryption-of-data-at-rest#bring-your-own-storage-byos-for-customization-and-logging): A value of 0 means that a plain blob URI without SAS token will be generated.
Skip	skip		integer	Number of datasets that will be skipped.
Top	top		integer	Number of datasets that will be included after skipping.
Filter	filter		string	A filtering expression for selecting a subset of the available files. Supported properties: name, createdDateTime, kind. Operators: - eq, ne are supported for all properties. - gt, ge, lt, le are supported for createdDateTime. - and, or, not are supported. Example: `filter=name eq 'myaudio.wav.json' and kind eq 'Transcription'`

Returns

Body: PaginatedFiles

Update transcription (V3.1)

Operation ID:: UpdateTranscriptions

Updates the mutable details of the transcription identified by its ID.

Parameters

Name	Key	Required	Type	Description
Id	id	True	uuid	The identifier of the transcription.
self	self	True	uri	The location of the referenced entity.
displayName	displayName		string	The name of the object.
description	description		string	The description of the object.
customProperties	customProperties		object	The custom properties of this entity. The maximum allowed key length is 64 characters, the maximumallowed value length is 256 characters and the count of allowed entries is 10.

Returns

Body: Transcription

Definitions

DiarizationProperties

Name	Path	Type	Description
speakers	speakers	DiarizationSpeakersProperties

DiarizationSpeakersProperties

Name	Path	Type	Description
minCount	minCount	integer	A hint for the minimum number of speakers for diarization. Must be smaller than or equal to the maxSpeakers property.
maxCount	maxCount	integer	The maximum number of speakers for diarization. Must be less than 36 and larger than or equal to the minSpeakers property.

File

Name	Path	Type	Description
kind	kind	FileKind	Type of data.
links	links	FileLinks
createdDateTime	createdDateTime	date-time	The creation time of this file.The time stamp is encoded as ISO 8601 date and time format(see https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations).
properties	properties	FileProperties
name	name	string	The name of this file.

FileKind

Type of data.

: string

FileLinks

Name	Path	Type	Description
contentUrl	contentUrl	uri	The url to retrieve the content of this file.

FileProperties

Name	Path	Type	Description
size	size	integer	The size of the data in bytes.
duration	duration	string	The duration in case this file is an audio file. The duration is encoded as ISO 8601duration ("PnYnMnDTnHnMnS", see https://en.wikipedia.org/wiki/ISO_8601#Durations).

LanguageIdentificationProperties

Name	Path	Type	Description
candidateLocales	candidateLocales	array of string	The candidate locales for language identification (example ["en-US", "de-DE", "es-ES"]). A minimum of 2 and a maximum of 10 candidate locales, including the main locale for the transcription, is supported.
speechModelMapping	speechModelMapping	object	An optional mapping of locales to speech model entities. If no model is given for a locale, the default base model is used.Keys must be locales contained in the candidate locales, values are entities for models of the respective locales.

PaginatedFiles

Name	Path	Type	Description
values	values	array of File	A list of entities limited by either the passed query parameters 'skip' and 'top' or their default values. When iterating through a list using pagination and deleting entities in parallel, some entities will be skipped in the results.It's recommended to build a list on the client and delete after the fetching of the complete list.
@nextLink	@nextLink	uri	A link to the next set of paginated results if there are more entities available; otherwise null.

PaginatedTranscriptions

Name	Path	Type	Description
values	values	array of Transcription	A list of entities limited by either the passed query parameters 'skip' and 'top' or their default values. When iterating through a list using pagination and deleting entities in parallel, some entities will be skipped in the results.It's recommended to build a list on the client and delete after the fetching of the complete list.
@nextLink	@nextLink	uri	A link to the next set of paginated results if there are more entities available; otherwise null.

ProfanityFilterMode

Mode of profanity filtering.

: string

PunctuationMode

The mode used for punctuation.

: string

Transcription

Name	Path	Type	Description
contentUrls	contentUrls	array of uri	You can provide a list of content urls to get audio files to transcribe. Up to 1000 urls are allowed.This property will not be returned in a response.
contentContainerUrl	contentContainerUrl	uri	Alternatively, you can provide a URL for an Azure blob container that contains the audio files. A container is allowed to have a maximum size of 5GB and a maximum number of 10000 blobs.The maximum size for a blob is 2.5GB.Container SAS should contain 'r' (read) and 'l' (list) permissions.This property will not be returned in a response.
locale	locale	string	The locale of the contained data. If Language Identification is used, this locale is used to transcribe speech for which no language could be detected.
displayName	displayName	string	The display name of the object.
model	model.self	uri	The location of the referenced entity.
properties	properties	TranscriptionProperties

TranscriptionProperties

Name	Path	Type	Description
diarizationEnabled	diarizationEnabled	boolean	A value indicating whether diarization (speaker identification) is requested. The default valueis `false`.If only this field is set to true and the improved diarization system is not enabled by specifying`DiarizationProperties`, a basic diarization system will distinguish between up to two speakers. Noextra charges are applied in this case. The improved diarization system provides diarization for aconfigurable range of speakers. It can be configured in the `DiarizationProperties` field. DEPRECATED: The basic diarization system is deprecated and will be removed along with the`diarizationEnabled` setting in the next major version of the API.
wordLevelTimestampsEnabled	wordLevelTimestampsEnabled	boolean	A value indicating whether word level timestamps are requested. The default value is`false`.
displayFormWordLevelTimestampsEnabled	displayFormWordLevelTimestampsEnabled	boolean	A value indicating whether word level timestamps for the display form are requested. The default value is `false`.
channels	channels	array of integer	A collection of the requested channel numbers.In the default case, the channels 0 and 1 are considered.
destinationContainerUrl	destinationContainerUrl	uri	The requested destination container.### Remarks ###When a destination container is used in combination with a `timeToLive`, the metadata of atranscription will be deleted normally, but the data stored in the destination container, includingtranscription results, will remain untouched, because no delete permissions are required for thiscontainer. To support automatic cleanup, either configure blob lifetimes on the container, or use "Bring your own Storage (BYOS)"instead of `destinationContainerUrl`, where blobs can be cleaned up.
punctuationMode	punctuationMode	PunctuationMode	The mode used for punctuation.
profanityFilterMode	profanityFilterMode	ProfanityFilterMode	Mode of profanity filtering.
timeToLive	timeToLive	string	How long the transcription will be kept in the system after it has completed. Once thetranscription reaches the time to live after completion (successful or failed) it will be automaticallydeleted. Not setting this value or setting it to 0 will disable automatic deletion. The longest supportedduration is 31 days.The duration is encoded as ISO 8601 duration ("PnYnMnDTnHnMnS", see https://en.wikipedia.org/wiki/ISO_8601#Durations).
diarization	diarization	DiarizationProperties
Language Identification -	languageIdentification	LanguageIdentificationProperties
email	email	string	The email address to send email notifications to in case the operation completes.The value will be removed after successfully sending the email.

Sdílet prostřednictvím

Azure Batch Speech-to-text

Pre-requisites

Creating a connection

Api Key

Microsoft Entra ID Integrated

Microsoft Entra ID Integrated (Azure Government)

Default [DEPRECATED]

Throttling Limits

Actions

Create transcription (V3.1)

Parameters

Returns

Delete transcription (V3.1)

Parameters

Get supported locales (V3.1)

Returns

Get transcription file (V3.1)

Parameters

Returns

Get transcriptions (V3.1)

Parameters

Returns

Get transcriptions list (V3.1)

Parameters

Returns

Get transcriptions list files (V3.1)

Parameters

Returns

Update transcription (V3.1)

Parameters

Returns

Definitions

DiarizationProperties

DiarizationSpeakersProperties

File

FileKind

FileLinks

FileProperties

LanguageIdentificationProperties

PaginatedFiles

PaginatedTranscriptions

ProfanityFilterMode

PunctuationMode

Transcription

TranscriptionProperties

Další materiály