AssemblyAI (Preview)
Transcribe and extract data from audio using AssemblyAI's Speech AI.
This connector is available in the following products and regions:
Service | Class | Regions |
---|---|---|
Logic Apps | Standard | All Logic Apps regions except the following: - Azure Government regions - Azure China regions - US Department of Defense (DoD) |
Contact | |
---|---|
Name | Support |
URL | https://www.assemblyai.com/docs/ |
support@assemblyai.com |
Connector Metadata | |
---|---|
Publisher | AssemblyAI |
Website | https://www.assemblyai.com |
Privacy policy | https://www.assemblyai.com/legal/privacy-policy |
Categories | AI |
With the AssemblyAI Connector, you can use AssemblyAI's models to process audio data by transcribing it with speech recognition models, analyzing it with audio intelligence models, and building generative features on top of it with LLMs.
- Speech-To-Text including many configurable features, such as speaker diarization, custom spelling, custom vocabulary, etc.
- Audio Intelligence Models are additional AI models available and configured through the transcription configuration.
- LeMUR lets you apply various LLM models to your transcripts without the need to build your own RAG infrastructure for very large transcripts.
Prerequisites
You will need the following to proceed:
- An AssemblyAI API key (get one for free)
How to get credentials
You can get an AssemblyAI API key for free by signing up for an account and copying the API key from the dashboard.
Get started with your connector
Follow these steps to transcribe audio using the AssemblyAI connector.
Upload a File
To transcribe an audio file using AssemblyAI, the file needs to be accessible to AssemblyAI. If your audio file is already accessible via a URL, you can use your existing URL.
Otherwise, you can use the Upload a File
action to upload a file to AssemblyAI.
You will get back a URL for your file which can only be used to transcribe using your API key.
Once you transcribe the file, the file will be removed from AssemblyAI's servers.
Transcribe Audio
To transcribe your audio, configure the Audio URL
parameter using your audio file URL.
Then, configure the additional parameters to enable more Speech Recognition features and Audio Intelligence models.
The result of the Transcribe Audio action is a queued transcript which will start being processed immediately. To get the completed transcript, you have two options:
Handle the Transcript Ready Webhook
If you don't want to handle the webhook using Logic Apps or Power Automate, configure the Webhook URL
parameter in your Transcribe Audio
action, and implement your webhook following AssemblyAI's webhook documentation.
To handle the webhook using Logic Apps or Power Automate, follow these steps:
Create a separate Logic App or Power Automate Flow
Configure
When an HTTP request is received
as the trigger:- Set
Who Can Trigger The Flow?
toAnyone
- Set
Request Body JSON Schema
to:{ "type": "object", "properties": { "transcript_id": { "type": "string" }, "status": { "type": "string" } } }
- Set
Method
toPOST
- Set
Add an AssemblyAI
Get Transcript
action, passing in thetranscript_id
from the trigger to theTranscript ID
parameter.Before doing anything else, you should check whether the
Status
iscompleted
orerror
. Add aCondition
action that checks if theStatus
from theGet Transcript
output iserror
:- In the
True
branch, add aTerminate
action- Set the
Status
toFailed
- Set the
Code
toTranscript Error
- Pass the
Error
from theGet Transcript
output to theMessage
parameter.
- Set the
- You can leave the
False
branch empty.
Now you can add any action after the
Condition
knowing the transcript status iscompleted
, and you can retrieve any of the output properties of theGet Transcript
action.- In the
Save your Logic App or Flow. The
HTTP URL
will be generated for theWhen an HTTP request is received
trigger. Copy theHTTP URL
and head back to your original Logic App or Flow.In your original Logic App or Flow, update the
Transcribe Audio
action. Paste theHTTP URL
you copied previously into theWebhook URL
parameter, and save.
When the transcript status becomes completed
or error
, AssemblyAI will send an HTTP POST request to the webhook URL,
which will be handled by your other Logic App or Flow.
As an alternative to using the webhook, you can poll the transcript status as explained in the next section.
Poll the Transcript Status
You can poll the transcript status using the following steps:
Add an
Initialize variable
action- Set
Name
totranscript_status
- Set
Type
toString
- Store the
Status
from theTranscribe Audio
output into theValue
parameter
- Set
Add a
Do until
action- Configure the
Loop Until
parameter with the following Fx code:
This code checks whether theor(equals(variables('transcript_status'), 'completed'), equals(variables('transcript_status'), 'error'))
transcript_status
variable iscompleted
orerror
. - Configure the
Count
parameter to86400
- Configure the
Timeout
parameter toPT24H
Inside the
Do until
action, add the following actions:- Add a
Delay
action that waits for one second - Add a
Get Transcript
action and pass theID
from theTranscribe Audio
output to theTranscript ID
parameter. - Add a
Set variable
action- Set
Name
totranscript_status
- Pass the
Status
of theGet Transcript
output to theValue
parameter
- Set
The
Do until
loop will continue until the transcript is completed, or an error occurred.- Configure the
Add another
Get Transcript
action, like before, but add it after theDo until
loop so its output becomes available outside the scope of theDo until
action.
Before doing anything else, you should check whether the transcript Status
is completed
or error
.
Add a Condition
action that checks if the transcript_status
is error
:
- In the
True
branch, add aTerminate
action- Set
Status
toFailed
- Set
Code
toTranscript Error
- Pass the
Error
from theGet Transcript
output to theMessage
parameter.
- Set
- You can leave the
False
branch empty.
Now you can add any action after the Condition
knowing the transcript status is completed
,
and you can retrieve any of the output properties of the Get Transcript
action.
Add more actions
Now that you have a completed transcription, you can use many other actions passing in the ID
of the transcript, such as
Get Sentences of Transcript
Get Paragraphs of Transcript
Get Subtitles of Transcript
Get Redacted Audio
Search Transcript for Words
Run a Task using LeMUR
Known issues and limitations
No known issues currently. We don't support Streaming Speech-To-Text (real-time) as it is not possible using Custom Connectors.
Common errors and remedies
You can find more information about errors in the AssemblyAI documentation.
FAQ
You can find frequently asked questions in our documentation.
Creating a connection
The connector supports the following authentication types:
Default | Parameters for creating connection. | All regions | Not shareable |
Default
Applicable: All regions
Parameters for creating connection.
This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.
Name | Type | Description | Required |
---|---|---|---|
AssemblyAI API Key | securestring | The AssemblyAI API Key to authenticate the AssemblyAI API. | True |
Throttling Limits
Name | Calls | Renewal Period |
---|---|---|
API calls per connection | 100 | 60 seconds |
Actions
Delete Transcript |
Delete the transcript. Deleting does not delete the resource itself, but removes the data from the resource and marks it as deleted. |
Get Paragraphs in Transcript |
Get the transcript split by paragraphs. The API will attempt to semantically segment your transcript into paragraphs to create more reader-friendly transcripts. |
Get Redacted Audio |
Retrieve the redacted audio object containing the status and URL to the redacted audio. |
Get Sentences in Transcript |
Get the transcript split by sentences. The API will attempt to semantically segment the transcript into sentences to create more reader-friendly transcripts. |
Get Subtitles for Transcript |
Export your transcript in SRT or VTT format to use with a video player for subtitles and closed captions. |
Get Transcript |
Get the transcript resource. The transcript is ready when the "status" is "completed". |
List Transcripts |
Retrieve a list of transcripts you created. Transcripts are sorted from newest to oldest. The previous URL always points to a page with older transcripts. |
Purge LeMUR Request Data |
Delete the data for a previously submitted LeMUR request. The LLM response data, as well as any context provided in the original request will be removed. |
Retrieve LeMUR Response |
Retrieve a LeMUR response that was previously generated. |
Run a Task Using LeMUR |
Use the LeMUR task endpoint to input your own LLM prompt. |
Search Words in Transcript |
Search through the transcript for keywords. You can search for individual words, numbers, or phrases containing up to five words or numbers. |
Transcribe Audio |
Create a transcript from a media file that is accessible via a URL. |
Upload a Media File |
Upload a media file to AssemblyAI's servers. |
Delete Transcript
Delete the transcript. Deleting does not delete the resource itself, but removes the data from the resource and marks it as deleted.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Transcript ID
|
transcript_id | True | string |
ID of the transcript |
Returns
A transcript object
- Body
- Transcript
Get Paragraphs in Transcript
Get the transcript split by paragraphs. The API will attempt to semantically segment your transcript into paragraphs to create more reader-friendly transcripts.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Transcript ID
|
transcript_id | True | string |
ID of the transcript |
Returns
- Body
- ParagraphsResponse
Get Redacted Audio
Retrieve the redacted audio object containing the status and URL to the redacted audio.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Transcript ID
|
transcript_id | True | string |
ID of the transcript |
Returns
Get Sentences in Transcript
Get the transcript split by sentences. The API will attempt to semantically segment the transcript into sentences to create more reader-friendly transcripts.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Transcript ID
|
transcript_id | True | string |
ID of the transcript |
Returns
- Body
- SentencesResponse
Get Subtitles for Transcript
Export your transcript in SRT or VTT format to use with a video player for subtitles and closed captions.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Transcript ID
|
transcript_id | True | string |
ID of the transcript |
Subtitle Format
|
subtitle_format | True | string |
Format of the subtitles |
Number of Characters per Caption
|
chars_per_caption | integer |
The maximum number of characters per caption |
Returns
- response
- string
Get Transcript
Get the transcript resource. The transcript is ready when the "status" is "completed".
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Transcript ID
|
transcript_id | True | string |
ID of the transcript |
Returns
A transcript object
- Body
- Transcript
List Transcripts
Retrieve a list of transcripts you created. Transcripts are sorted from newest to oldest. The previous URL always points to a page with older transcripts.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Limit
|
limit | integer |
Maximum amount of transcripts to retrieve |
|
Status
|
status | string |
The status of your transcript. Possible values are queued, processing, completed, or error. |
|
Created On
|
created_on | date |
Only get transcripts created on this date |
|
Before ID
|
before_id | uuid |
Get transcripts that were created before this transcript ID |
|
After ID
|
after_id | uuid |
Get transcripts that were created after this transcript ID |
|
Throttled Only
|
throttled_only | boolean |
Only get throttled transcripts, overrides the status filter |
Returns
A list of transcripts. Transcripts are sorted from newest to oldest. The previous URL always points to a page with older transcripts.
- Body
- TranscriptList
Purge LeMUR Request Data
Delete the data for a previously submitted LeMUR request. The LLM response data, as well as any context provided in the original request will be removed.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
LeMUR Request ID
|
request_id | True | string |
The ID of the LeMUR request whose data you want to delete. This would be found in the response of the original request. |
Returns
Retrieve LeMUR Response
Retrieve a LeMUR response that was previously generated.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
LeMUR Request ID
|
request_id | True | string |
The ID of the LeMUR request you previously made. This would be found in the response of the original request. |
Returns
- Body
- LemurResponse
Run a Task Using LeMUR
Use the LeMUR task endpoint to input your own LLM prompt.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Prompt
|
prompt | True | string |
Your text to prompt the model to produce a desired output, including any context you want to pass into the model. |
Transcript IDs
|
transcript_ids | array of uuid |
A list of completed transcripts with text. Up to a maximum of 100 files or 100 hours, whichever is lower. Use either transcript_ids or input_text as input into LeMUR. |
|
Input Text
|
input_text | string |
Custom formatted transcript data. Maximum size is the context limit of the selected model, which defaults to 100000. Use either transcript_ids or input_text as input into LeMUR. |
|
Context
|
context | string |
Context to provide the model. This can be a string or a free-form JSON value. |
|
Final Model
|
final_model | string |
The model that is used for the final prompt after compression is performed. |
|
Maximum Output Size
|
max_output_size | integer |
Max output size in tokens, up to 4000 |
|
Temperature
|
temperature | float |
The temperature to use for the model. Higher values result in answers that are more creative, lower values are more conservative. Can be any value between 0.0 and 1.0 inclusive. |
Returns
- Body
- LemurTaskResponse
Search Words in Transcript
Search through the transcript for keywords. You can search for individual words, numbers, or phrases containing up to five words or numbers.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Transcript ID
|
transcript_id | True | string |
ID of the transcript |
Words
|
words | True | array |
Keywords to search for |
Returns
- Body
- WordSearchResponse
Transcribe Audio
Create a transcript from a media file that is accessible via a URL.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Audio URL
|
audio_url | True | string |
The URL of the audio or video file to transcribe. |
Language Code
|
language_code | string |
The language of your audio file. Possible values are found in Supported Languages. The default value is 'en_us'. |
|
Language Detection
|
language_detection | boolean |
Enable Automatic language detection, either true or false. |
|
Speech Model
|
speech_model | string |
The speech model to use for the transcription. |
|
Punctuate
|
punctuate | boolean |
Enable Automatic Punctuation, can be true or false |
|
Format Text
|
format_text | boolean |
Enable Text Formatting, can be true or false |
|
Disfluencies
|
disfluencies | boolean |
Transcribe Filler Words, like "umm", in your media file; can be true or false |
|
Dual Channel
|
dual_channel | boolean |
Enable Dual Channel transcription, can be true or false. |
|
Webhook URL
|
webhook_url | string |
The URL to which we send webhook requests. We sends two different types of webhook requests. One request when a transcript is completed or failed, and one request when the redacted audio is ready if redact_pii_audio is enabled. |
|
Webhook Auth Header Name
|
webhook_auth_header_name | string |
The header name to be sent with the transcript completed or failed webhook requests |
|
Webhook Auth Header Value
|
webhook_auth_header_value | string |
The header value to send back with the transcript completed or failed webhook requests for added security |
|
Key Phrases
|
auto_highlights | boolean |
Enable Key Phrases, either true or false |
|
Audio Start From
|
audio_start_from | integer |
The point in time, in milliseconds, to begin transcribing in your media file |
|
Audio End At
|
audio_end_at | integer |
The point in time, in milliseconds, to stop transcribing in your media file |
|
Word Boost
|
word_boost | array of string |
The list of custom vocabulary to boost transcription probability for |
|
Word Boost Level
|
boost_param | string |
How much to boost specified words |
|
Filter Profanity
|
filter_profanity | boolean |
Filter profanity from the transcribed text, can be true or false |
|
Redact PII
|
redact_pii | boolean |
Redact PII from the transcribed text using the Redact PII model, can be true or false |
|
Redact PII Audio
|
redact_pii_audio | boolean |
Generate a copy of the original media file with spoken PII "beeped" out, can be true or false. See PII redaction for more details. |
|
Redact PII Audio Quality
|
redact_pii_audio_quality | string |
Controls the filetype of the audio created by redact_pii_audio. Currently supports mp3 (default) and wav. See PII redaction for more details. |
|
Redact PII Policies
|
redact_pii_policies | array of string |
The list of PII Redaction policies to enable. See PII redaction for more details. |
|
Redact PII Substitution
|
redact_pii_sub | string |
The replacement logic for detected PII, can be "entity_name" or "hash". See PII redaction for more details. |
|
Speaker Labels
|
speaker_labels | boolean |
Enable Speaker diarization, can be true or false |
|
Speakers Expected
|
speakers_expected | integer |
Tells the speaker label model how many speakers it should attempt to identify, up to 10. See Speaker diarization for more details. |
|
Content Moderation
|
content_safety | boolean |
Enable Content Moderation, can be true or false |
|
Content Moderation Confidence
|
content_safety_confidence | integer |
The confidence threshold for the Content Moderation model. Values must be between 25 and 100. |
|
Topic Detection
|
iab_categories | boolean |
Enable Topic Detection, can be true or false |
|
From
|
from | True | array of string |
Words or phrases to replace |
To
|
to | True | string |
Word or phrase to replace with |
Sentiment Analysis
|
sentiment_analysis | boolean |
Enable Sentiment Analysis, can be true or false |
|
Auto Chapters
|
auto_chapters | boolean |
Enable Auto Chapters, can be true or false |
|
Entity Detection
|
entity_detection | boolean |
Enable Entity Detection, can be true or false |
|
Speech Threshold
|
speech_threshold | float |
Reject audio files that contain less than this fraction of speech. Valid values are in the range [0, 1] inclusive. |
|
Enable Summarization
|
summarization | boolean |
Enable Summarization, can be true or false |
|
Summary Model
|
summary_model | string |
The model to summarize the transcript |
|
Summary Type
|
summary_type | string |
The type of summary |
|
Enable Custom Topics
|
custom_topics | boolean |
Enable custom topics, either true or false |
|
Custom Topics
|
topics | array of string |
The list of custom topics |
Returns
A transcript object
- Body
- Transcript
Upload a Media File
Upload a media file to AssemblyAI's servers.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
File Content
|
file | True | binary |
The file to upload. |
Returns
- Body
- UploadedFile
Definitions
RedactedAudioResponse
Name | Path | Type | Description |
---|---|---|---|
Status
|
status | string |
The status of the redacted audio |
Redacted Audio URL
|
redacted_audio_url | string |
The URL of the redacted audio file |
WordSearchResponse
Name | Path | Type | Description |
---|---|---|---|
Transcript ID
|
id | uuid |
The ID of the transcript |
Total Count of Matches
|
total_count | integer |
The total count of all matched instances. For e.g., word 1 matched 2 times, and word 2 matched 3 times, total_count will equal 5. |
Matches
|
matches | array of object |
The matches of the search |
Text
|
matches.text | string |
The matched word |
Count
|
matches.count | integer |
The total amount of times the word is in the transcript |
Timestamps
|
matches.timestamps | array of array |
An array of timestamps |
Timestamp
|
matches.timestamps | array of integer |
An array of timestamps structured as [start_time, end_time] in milliseconds |
Indexes
|
matches.indexes | array of integer |
An array of all index locations for that word within the words array of the completed transcript |
Transcript
A transcript object
Name | Path | Type | Description |
---|---|---|---|
ID
|
id | uuid |
The unique identifier of your transcript |
Audio URL
|
audio_url | string |
The URL of the media that was transcribed |
Status
|
status | string |
The status of your transcript. Possible values are queued, processing, completed, or error. |
Language Code
|
language_code | string |
The language of your audio file. Possible values are found in Supported Languages. The default value is 'en_us'. |
Language Detection
|
language_detection | boolean |
Whether Automatic language detection is enabled, either true or false |
Speech Model
|
speech_model | string |
The speech model to use for the transcription. |
Text
|
text | string |
The textual transcript of your media file |
Words
|
words | array of object |
An array of temporally-sequential word objects, one for each word in the transcript. See Speech recognition for more information. |
Confidence
|
words.confidence | double | |
Start
|
words.start | integer | |
End
|
words.end | integer | |
Text
|
words.text | string | |
Speaker
|
words.speaker | string |
The speaker of the sentence if Speaker Diarization is enabled, else null |
Utterances
|
utterances | array of object |
When dual_channel or speaker_labels is enabled, a list of turn-by-turn utterance objects. See Speaker diarization for more information. |
Confidence
|
utterances.confidence | double |
The confidence score for the transcript of this utterance |
Start
|
utterances.start | integer |
The starting time, in milliseconds, of the utterance in the audio file |
End
|
utterances.end | integer |
The ending time, in milliseconds, of the utterance in the audio file |
Text
|
utterances.text | string |
The text for this utterance |
Words
|
utterances.words | array of object |
The words in the utterance. |
Confidence
|
utterances.words.confidence | double | |
Start
|
utterances.words.start | integer | |
End
|
utterances.words.end | integer | |
Text
|
utterances.words.text | string | |
Speaker
|
utterances.words.speaker | string |
The speaker of the sentence if Speaker Diarization is enabled, else null |
Speaker
|
utterances.speaker | string |
The speaker of this utterance, where each speaker is assigned a sequential capital letter - e.g. "A" for Speaker A, "B" for Speaker B, etc. |
Confidence
|
confidence | double |
The confidence score for the transcript, between 0.0 (low confidence) and 1.0 (high confidence) |
Audio Duration
|
audio_duration | integer |
The duration of this transcript object's media file, in seconds |
Punctuate
|
punctuate | boolean |
Whether Automatic Punctuation is enabled, either true or false |
Format Text
|
format_text | boolean |
Whether Text Formatting is enabled, either true or false |
Disfluencies
|
disfluencies | boolean |
Transcribe Filler Words, like "umm", in your media file; can be true or false |
Dual Channel
|
dual_channel | boolean |
Whether Dual channel transcription was enabled in the transcription request, either true or false |
Webhook URL
|
webhook_url | string |
The URL to which we send webhook requests. We sends two different types of webhook requests. One request when a transcript is completed or failed, and one request when the redacted audio is ready if redact_pii_audio is enabled. |
Webhook HTTP Status Code
|
webhook_status_code | integer |
The status code we received from your server when delivering the transcript completed or failed webhook request, if a webhook URL was provided |
Webhook Auth Enabled
|
webhook_auth | boolean |
Whether webhook authentication details were provided |
Webhook Auth Header Name
|
webhook_auth_header_name | string |
The header name to be sent with the transcript completed or failed webhook requests |
Speed Boost
|
speed_boost | boolean |
Whether speed boost is enabled |
Key Phrases
|
auto_highlights | boolean |
Whether Key Phrases is enabled, either true or false |
Status
|
auto_highlights_result.status | string |
Either success, or unavailable in the rare case that the model failed |
Results
|
auto_highlights_result.results | array of object |
A temporally-sequential array of Key Phrases |
Count
|
auto_highlights_result.results.count | integer |
The total number of times the key phrase appears in the audio file |
Rank
|
auto_highlights_result.results.rank | float |
The total relevancy to the overall audio file of this key phrase - a greater number means more relevant |
Text
|
auto_highlights_result.results.text | string |
The text itself of the key phrase |
Timestamps
|
auto_highlights_result.results.timestamps | array of object |
The timestamp of the of the key phrase |
Start
|
auto_highlights_result.results.timestamps.start | integer |
The start time in milliseconds |
End
|
auto_highlights_result.results.timestamps.end | integer |
The end time in milliseconds |
Audio Start From
|
audio_start_from | integer |
The point in time, in milliseconds, in the file at which the transcription was started |
Audio End At
|
audio_end_at | integer |
The point in time, in milliseconds, in the file at which the transcription was terminated |
Word Boost
|
word_boost | array of string |
The list of custom vocabulary to boost transcription probability for |
Boost
|
boost_param | string |
The word boost parameter value |
Filter Profanity
|
filter_profanity | boolean |
Whether Profanity Filtering is enabled, either true or false |
Redact PII
|
redact_pii | boolean |
Whether PII Redaction is enabled, either true or false |
Redact PII Audio
|
redact_pii_audio | boolean |
Whether a redacted version of the audio file was generated, either true or false. See PII redaction for more information. |
Redact PII Audio Quality
|
redact_pii_audio_quality | string |
Controls the filetype of the audio created by redact_pii_audio. Currently supports mp3 (default) and wav. See PII redaction for more details. |
Redact PII Policies
|
redact_pii_policies | array of string |
The list of PII Redaction policies that were enabled, if PII Redaction is enabled. See PII redaction for more information. |
Redact PII Substitution
|
redact_pii_sub | string |
The replacement logic for detected PII, can be "entity_name" or "hash". See PII redaction for more details. |
Speaker Labels
|
speaker_labels | boolean |
Whether Speaker diarization is enabled, can be true or false |
Speakers Expected
|
speakers_expected | integer |
Tell the speaker label model how many speakers it should attempt to identify, up to 10. See Speaker diarization for more details. |
Content Moderation
|
content_safety | boolean |
Whether Content Moderation is enabled, can be true or false |
Status
|
content_safety_labels.status | string |
Either success, or unavailable in the rare case that the model failed |
Results
|
content_safety_labels.results | array of object | |
Text
|
content_safety_labels.results.text | string |
The transcript of the section flagged by the Content Moderation model |
Labels
|
content_safety_labels.results.labels | array of object |
An array of safety labels, one per sensitive topic that was detected in the section |
Label
|
content_safety_labels.results.labels.label | string |
The label of the sensitive topic |
Confidence
|
content_safety_labels.results.labels.confidence | double |
The confidence score for the topic being discussed, from 0 to 1 |
Severity
|
content_safety_labels.results.labels.severity | double |
How severely the topic is discussed in the section, from 0 to 1 |
Sentence Index Start
|
content_safety_labels.results.sentences_idx_start | integer |
The sentence index at which the section begins |
Sentence Index End
|
content_safety_labels.results.sentences_idx_end | integer |
The sentence index at which the section ends |
Start
|
content_safety_labels.results.timestamp.start | integer |
The start time in milliseconds |
End
|
content_safety_labels.results.timestamp.end | integer |
The end time in milliseconds |
Summary
|
content_safety_labels.summary | object |
A summary of the Content Moderation confidence results for the entire audio file |
Severity Score Summary
|
content_safety_labels.severity_score_summary | object |
A summary of the Content Moderation severity results for the entire audio file |
Topic Detection
|
iab_categories | boolean |
Whether Topic Detection is enabled, can be true or false |
Status
|
iab_categories_result.status | string |
Either success, or unavailable in the rare case that the model failed |
Results
|
iab_categories_result.results | array of object |
An array of results for the Topic Detection model |
Text
|
iab_categories_result.results.text | string |
The text in the transcript in which a detected topic occurs |
Labels
|
iab_categories_result.results.labels | array of object | |
Relevance
|
iab_categories_result.results.labels.relevance | double |
How relevant the detected topic is of a detected topic |
Label
|
iab_categories_result.results.labels.label | string |
The IAB taxonomical label for the label of the detected topic, where > denotes supertopic/subtopic relationship |
Start
|
iab_categories_result.results.timestamp.start | integer |
The start time in milliseconds |
End
|
iab_categories_result.results.timestamp.end | integer |
The end time in milliseconds |
Summary
|
iab_categories_result.summary | object |
The overall relevance of topic to the entire audio file |
Custom Spellings
|
custom_spelling | array of object |
Customize how words are spelled and formatted using to and from values |
From
|
custom_spelling.from | array of string |
Words or phrases to replace |
To
|
custom_spelling.to | string |
Word or phrase to replace with |
Auto Chapters Enabled
|
auto_chapters | boolean |
Whether Auto Chapters is enabled, can be true or false |
Chapters
|
chapters | array of object |
An array of temporally sequential chapters for the audio file |
Gist
|
chapters.gist | string |
An ultra-short summary (just a few words) of the content spoken in the chapter |
Headline
|
chapters.headline | string |
A single sentence summary of the content spoken during the chapter |
Summary
|
chapters.summary | string |
A one paragraph summary of the content spoken during the chapter |
Start
|
chapters.start | integer |
The starting time, in milliseconds, for the chapter |
End
|
chapters.end | integer |
The starting time, in milliseconds, for the chapter |
Summarization Enabled
|
summarization | boolean |
Whether Summarization is enabled, either true or false |
Summary Type
|
summary_type | string |
The type of summary generated, if Summarization is enabled |
Summary Model
|
summary_model | string |
The Summarization model used to generate the summary, if Summarization is enabled |
Summary
|
summary | string |
The generated summary of the media file, if Summarization is enabled |
Custom Topics Enabled
|
custom_topics | boolean |
Whether custom topics is enabled, either true or false |
Topics
|
topics | array of string |
The list of custom topics provided if custom topics is enabled |
Sentiment Analysis
|
sentiment_analysis | boolean |
Whether Sentiment Analysis is enabled, can be true or false |
Sentiment Analysis Results
|
sentiment_analysis_results | array of object |
An array of results for the Sentiment Analysis model, if it is enabled. See Sentiment Analysis for more information. |
Text
|
sentiment_analysis_results.text | string |
The transcript of the sentence |
Start
|
sentiment_analysis_results.start | integer |
The starting time, in milliseconds, of the sentence |
End
|
sentiment_analysis_results.end | integer |
The ending time, in milliseconds, of the sentence |
Sentiment
|
sentiment_analysis_results.sentiment |
The detected sentiment for the sentence, one of POSITIVE, NEUTRAL, NEGATIVE |
|
Confidence
|
sentiment_analysis_results.confidence | double |
The confidence score for the detected sentiment of the sentence, from 0 to 1 |
Speaker
|
sentiment_analysis_results.speaker | string |
The speaker of the sentence if Speaker Diarization is enabled, else null |
Entity Detection
|
entity_detection | boolean |
Whether Entity Detection is enabled, can be true or false |
Entities
|
entities | array of object |
An array of results for the Entity Detection model, if it is enabled. See Entity detection for more information. |
Entity Type
|
entities.entity_type | string |
The type of entity for the detected entity |
Text
|
entities.text | string |
The text for the detected entity |
Start
|
entities.start | integer |
The starting time, in milliseconds, at which the detected entity appears in the audio file |
End
|
entities.end | integer |
The ending time, in milliseconds, for the detected entity in the audio file |
Speech Threshold
|
speech_threshold | float |
Defaults to null. Reject audio files that contain less than this fraction of speech. Valid values are in the range [0, 1] inclusive. |
Throttled
|
throttled | boolean |
True while a request is throttled and false when a request is no longer throttled |
Error
|
error | string |
Error message of why the transcript failed |
Language Model
|
language_model | string |
The language model that was used for the transcript |
Acoustic Model
|
acoustic_model | string |
The acoustic model that was used for the transcript |
SentencesResponse
Name | Path | Type | Description |
---|---|---|---|
Transcript ID
|
id | uuid | |
Confidence
|
confidence | double | |
Audio Duration
|
audio_duration | number | |
Sentences
|
sentences | array of object | |
Text
|
sentences.text | string | |
Start
|
sentences.start | integer | |
End
|
sentences.end | integer | |
Confidence
|
sentences.confidence | double | |
Words
|
sentences.words | array of object | |
Confidence
|
sentences.words.confidence | double | |
Start
|
sentences.words.start | integer | |
End
|
sentences.words.end | integer | |
Text
|
sentences.words.text | string | |
Speaker
|
sentences.words.speaker | string |
The speaker of the sentence if Speaker Diarization is enabled, else null |
Speaker
|
sentences.speaker | string |
The speaker of the sentence if Speaker Diarization is enabled, else null |
ParagraphsResponse
Name | Path | Type | Description |
---|---|---|---|
Transcript ID
|
id | uuid | |
Confidence
|
confidence | double | |
Audio Duration
|
audio_duration | number | |
Paragraphs
|
paragraphs | array of object | |
Text
|
paragraphs.text | string | |
Start
|
paragraphs.start | integer | |
End
|
paragraphs.end | integer | |
Confidence
|
paragraphs.confidence | double | |
Words
|
paragraphs.words | array of object | |
Confidence
|
paragraphs.words.confidence | double | |
Start
|
paragraphs.words.start | integer | |
End
|
paragraphs.words.end | integer | |
Text
|
paragraphs.words.text | string | |
Speaker
|
paragraphs.words.speaker | string |
The speaker of the sentence if Speaker Diarization is enabled, else null |
Speaker
|
paragraphs.speaker | string |
The speaker of the sentence if Speaker Diarization is enabled, else null |
TranscriptList
A list of transcripts. Transcripts are sorted from newest to oldest. The previous URL always points to a page with older transcripts.
Name | Path | Type | Description |
---|---|---|---|
Limit
|
page_details.limit | integer |
The number of results this page is limited to |
Result Count
|
page_details.result_count | integer |
The actual number of results in the page |
Current URL
|
page_details.current_url | string |
The URL used to retrieve the current page of transcripts |
Previous URL
|
page_details.prev_url | string |
The URL to the next page of transcripts. The previous URL always points to a page with older transcripts. |
Next URL
|
page_details.next_url | string |
The URL to the next page of transcripts. The next URL always points to a page with newer transcripts. |
Transcripts
|
transcripts | array of object | |
ID
|
transcripts.id | uuid | |
Resource URL
|
transcripts.resource_url | string | |
Status
|
transcripts.status | string |
The status of your transcript. Possible values are queued, processing, completed, or error. |
Created
|
transcripts.created | string | |
Completed
|
transcripts.completed | string | |
Audio URL
|
transcripts.audio_url | string | |
Error
|
transcripts.error | string |
Error message of why the transcript failed |
UploadedFile
Name | Path | Type | Description |
---|---|---|---|
Uploaded File URL
|
upload_url | string |
A URL that points to your audio file, accessible only by AssemblyAI's servers |
PurgeLemurRequestDataResponse
Name | Path | Type | Description |
---|---|---|---|
Purge Request ID
|
request_id | uuid |
The ID of the deletion request of the LeMUR request |
LeMUR Request ID to Purge
|
request_id_to_purge | uuid |
The ID of the LeMUR request to purge the data for |
Deleted
|
deleted | boolean |
Whether the request data was deleted |
LemurTaskResponse
Name | Path | Type | Description |
---|---|---|---|
Response
|
response | string |
The response generated by LeMUR. |
LeMUR Request ID
|
request_id | uuid |
The ID of the LeMUR request |
Input Tokens
|
usage.input_tokens | integer |
The number of input tokens used by the model |
Output Tokens
|
usage.output_tokens | integer |
The number of output tokens generated by the model |
LemurResponse
Name | Path | Type | Description |
---|---|---|---|
Response
|
response | string |
The response generated by LeMUR. |
LeMUR Request ID
|
request_id | uuid |
The ID of the LeMUR request |
Input Tokens
|
usage.input_tokens | integer |
The number of input tokens used by the model |
Output Tokens
|
usage.output_tokens | integer |
The number of output tokens generated by the model |
string
This is the basic data type 'string'.