Datasets - Create
Uploads and creates a new dataset by getting the data from a specified URL or starts waiting for data blocks to be uploaded.
POST {endpoint}/speechtotext/v3.2-preview.2/datasets
URI Parameters
Name | In | Required | Type | Description |
---|---|---|---|---|
endpoint
|
path | True |
string |
Supported Cognitive Services endpoints (protocol and hostname, for example: https://westus.api.cognitive.microsoft.com). |
Request Body
Name | Required | Type | Description |
---|---|---|---|
displayName | True |
string minLength: 1 |
The display name of the object. |
kind | True |
DatasetKind |
|
locale | True |
string minLength: 1 |
The locale of the contained data. |
contentUrl |
string (uri) |
The URL of the data for the dataset. |
|
customProperties |
object |
The custom properties of this entity. The maximum allowed key length is 64 characters, the maximum allowed value length is 256 characters and the count of allowed entries is 10. |
|
description |
string |
The description of the object. |
|
project |
EntityReference |
||
properties |
DatasetProperties |
Responses
Name | Type | Description |
---|---|---|
201 Created |
The response contains information about the entity as payload and its location as header. Headers Location: string |
|
Other Status Codes |
An error occurred. |
Security
Ocp-Apim-Subscription-Key
Provide your cognitive services account key here.
Type:
apiKey
In:
header
Authorization
Provide an access token from the JWT returned by the STS of this region. Make sure to add the management scope to the token by adding the following query string to the STS URL: ?scope=speechservicesmanagement
Type:
apiKey
In:
header
Examples
Create a dataset with content url |
Create dataset from data blocks |
Create a dataset with content url
Sample request
POST {endpoint}/speechtotext/v3.2-preview.2/datasets
{
"kind": "Acoustic",
"contentUrl": "https://contoso.com/location",
"locale": "en-US",
"displayName": "My speech dataset name",
"description": "My speech dataset description"
}
Sample response
Location: https://westus.api.cognitive.microsoft.com/speechtotext/v3.2-preview.2/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1
{
"self": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.2-preview.2/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1",
"kind": "Acoustic",
"contentUrl": "https://www.contoso.com/acousticdata/sourcelocation",
"links": {
"files": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.2-preview.2/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1/files"
},
"properties": {
"textNormalizationKind": "Default",
"acceptedLineCount": 11,
"rejectedLineCount": 2,
"duration": "PT4M12S"
},
"lastActionDateTime": "2019-01-07T11:36:07Z",
"status": "Succeeded",
"createdDateTime": "2019-01-07T11:34:12Z",
"locale": "en-US",
"displayName": "Acoustic dataset"
}
Create dataset from data blocks
Sample request
POST {endpoint}/speechtotext/v3.2-preview.2/datasets
{
"kind": "Acoustic",
"locale": "en-US",
"displayName": "My speech dataset name",
"description": "My speech dataset description"
}
Sample response
{
"self": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.2-preview.2/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1",
"kind": "Acoustic",
"links": {
"files": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.2-preview.2/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1/files",
"commitBlocks": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.2-preview.2/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1/blocks:commit",
"listBlocks": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.2-preview.2/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1/blocks",
"uploadBlocks": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.2-preview.2/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1/blocks"
},
"lastActionDateTime": "2019-01-07T11:36:07Z",
"status": "NotStarted",
"createdDateTime": "2019-01-07T11:34:12Z",
"locale": "en-US",
"displayName": "Acoustic dataset"
}
Definitions
Name | Description |
---|---|
Dataset |
Dataset |
Dataset |
DatasetKind |
Dataset |
DatasetLinks |
Dataset |
DatasetProperties |
Detailed |
DetailedErrorCode |
Entity |
EntityError |
Entity |
EntityReference |
Error |
Error |
Error |
ErrorCode |
Inner |
InnerError |
Status |
Status |
Text |
TextNormalizationKind |
Dataset
Dataset
Name | Type | Description |
---|---|---|
contentUrl |
string (uri) |
The URL of the data for the dataset. |
createdDateTime |
string (date-time) |
The time-stamp when the object was created. The time stamp is encoded as ISO 8601 date and time format ("YYYY-MM-DDThh:mm:ssZ", see https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations). |
customProperties |
object |
The custom properties of this entity. The maximum allowed key length is 64 characters, the maximum allowed value length is 256 characters and the count of allowed entries is 10. |
description |
string |
The description of the object. |
displayName |
string minLength: 1 |
The display name of the object. |
kind |
DatasetKind |
|
lastActionDateTime |
string (date-time) |
The time-stamp when the current status was entered. The time stamp is encoded as ISO 8601 date and time format ("YYYY-MM-DDThh:mm:ssZ", see https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations). |
links |
DatasetLinks |
|
locale |
string minLength: 1 |
The locale of the contained data. |
project |
EntityReference |
|
properties |
DatasetProperties |
|
self |
string (uri) |
The location of this entity. |
status |
Status |
DatasetKind
DatasetKind
Value | Description |
---|---|
Language |
A language dataset. |
Acoustic |
An acoustic dataset. |
Pronunciation |
A pronunciation dataset. |
AudioFiles |
An audio files dataset. |
LanguageMarkdown |
A language markdown dataset. |
OutputFormatting |
Dataset that contains rules to customize inverse text normalization, capitalization, reformulation, profanity and also defines tests for dataset validation |
DatasetLinks
DatasetLinks
Name | Type | Description |
---|---|---|
commitBlocks |
string (uri) |
The location to commit the list of blocks when uploading a dataset using blocks. See operation "Datasets_CommitBlocks" for more details. |
files |
string (uri) |
The location to get all files of this entity. See operation "Datasets_ListFiles" for more details. |
listBlocks |
string (uri) |
The location to list the already uploaded blocks of this entity when uploading a dataset using blocks. See operation "Datasets_GetBlocks" for more details. |
uploadBlocks |
string (uri) |
The location to upload blocks to when uploading a dataset using blocks. See operation "Datasets_UploadBlock" for more details. |
DatasetProperties
DatasetProperties
Name | Type | Description |
---|---|---|
acceptedLineCount |
integer (int32) |
The number of lines accepted for this data set. |
duration |
string |
The total duration of the datasets if it contains audio files. The duration is encoded as ISO 8601 duration ("PnYnMnDTnHnMnS", see https://en.wikipedia.org/wiki/ISO_8601#Durations). |
string |
The email address to send email notifications to in case the operation completes. The value will be removed after successfully sending the email. |
|
error |
EntityError |
|
rejectedLineCount |
integer (int32) |
The number of lines rejected for this data set. |
textNormalizationKind |
TextNormalizationKind |
DetailedErrorCode
DetailedErrorCode
Value | Description |
---|---|
InvalidParameterValue |
Invalid parameter value. |
InvalidRequestBodyFormat |
Invalid request body format. |
EmptyRequest |
Empty Request. |
MissingInputRecords |
Missing Input Records. |
InvalidDocument |
Invalid Document. |
ModelVersionIncorrect |
Model Version Incorrect. |
InvalidDocumentBatch |
Invalid Document Batch. |
UnsupportedLanguageCode |
Unsupported language code. |
DataImportFailed |
Data import failed. |
InUseViolation |
In use violation. |
InvalidLocale |
Invalid locale. |
InvalidBaseModel |
Invalid base model. |
InvalidAdaptationMapping |
Invalid adaptation mapping. |
InvalidDataset |
Invalid dataset. |
InvalidTest |
Invalid test. |
FailedDataset |
Failed dataset. |
InvalidModel |
Invalid model. |
InvalidTranscription |
Invalid transcription. |
InvalidPayload |
Invalid payload. |
InvalidParameter |
Invalid parameter. |
EndpointWithoutLogging |
Endpoint without logging. |
InvalidPermissions |
Invalid permissions. |
InvalidPrerequisite |
Invalid prerequisite. |
InvalidProductId |
Invalid product id. |
InvalidSubscription |
Invalid subscription. |
InvalidProject |
Invalid project. |
InvalidProjectKind |
Invalid project kind. |
InvalidRecordingsUri |
Invalid recordings uri. |
OnlyOneOfUrlsOrContainerOrDataset |
Only one of urls or container or dataset. |
ExceededNumberOfRecordingsUris |
Exceeded number of recordings uris. |
ModelMismatch |
Model mismatch. |
ProjectGenderMismatch |
Project gender mismatch. |
ModelDeprecated |
Model deprecated. |
ModelExists |
Model exists. |
ModelNotDeployable |
Model not deployable. |
EndpointNotUpdatable |
Endpoint not updatable. |
SingleDefaultEndpoint |
Single default endpoint. |
EndpointCannotBeDefault |
Endpoint cannot be default. |
InvalidModelUri |
Invalid model uri. |
SubscriptionNotFound |
Subscription not found. |
QuotaViolation |
Quota violation. |
UnsupportedDelta |
Unsupported delta. |
UnsupportedFilter |
Unsupported filter. |
UnsupportedPagination |
Unsupported pagination. |
UnsupportedDynamicConfiguration |
Unsupported dynamic configuration. |
UnsupportedOrderBy |
Unsupported order by. |
NoUtf8WithBom |
No utf8 with bom. |
ModelDeploymentNotCompleteState |
Model deployment not complete state. |
SkuLimitsExist |
Sku limits exist. |
DeployingFailedModel |
Deploying failed model. |
UnsupportedTimeRange |
Unsupported time range. |
InvalidLogDate |
Invalid log date. |
InvalidLogId |
Invalid log id. |
InvalidLogStartTime |
Invalid log start time. |
InvalidLogEndTime |
Invalid log end time. |
InvalidTopForLogs |
Invalid top for logs. |
InvalidSkipTokenForLogs |
Invalid skip token for logs. |
DeleteNotAllowed |
Delete not allowed. |
Forbidden |
Forbidden. |
DeployNotAllowed |
Deploy not allowed. |
UnexpectedError |
Unexpected error. |
InvalidCollection |
Invalid collection. |
InvalidCallbackUri |
Invalid callback uri. |
InvalidSasValidityDuration |
Invalid sas validity duration. |
InaccessibleCustomerStorage |
Inaccessible customer storage. |
UnsupportedClassBasedAdaptation |
Unsupported class based adaptation. |
InvalidWebHookEventKind |
Invalid web hook event kind. |
InvalidTimeToLive |
Invalid time to live. |
InvalidSourceAzureResourceId |
Invalid source Azure resource ID. |
ModelCopyOperationExists |
Model copy operation exists. |
EntityError
EntityError
Name | Type | Description |
---|---|---|
code |
string |
The code of this error. |
message |
string |
The message for this error. |
EntityReference
EntityReference
Name | Type | Description |
---|---|---|
self |
string (uri) |
The location of the referenced entity. |
Error
Error
Name | Type | Description |
---|---|---|
code |
ErrorCode |
|
details |
Error[] |
Additional supportive details regarding the error and/or expected policies. |
innerError |
InnerError |
|
message |
string |
High level error message. |
target |
string |
The source of the error. For example it would be "documents" or "document id" in case of invalid document. |
ErrorCode
ErrorCode
Value | Description |
---|---|
InvalidRequest |
Representing the invalid request error code. |
InvalidArgument |
Representing the invalid argument error code. |
InternalServerError |
Representing the internal server error error code. |
ServiceUnavailable |
Representing the service unavailable error code. |
NotFound |
Representing the not found error code. |
PipelineError |
Representing the pipeline error error code. |
Conflict |
Representing the conflict error code. |
InternalCommunicationFailed |
Representing the internal communication failed error code. |
Forbidden |
Representing the forbidden error code. |
NotAllowed |
Representing the not allowed error code. |
Unauthorized |
Representing the unauthorized error code. |
UnsupportedMediaType |
Representing the unsupported media type error code. |
TooManyRequests |
Representing the too many requests error code. |
UnprocessableEntity |
Representing the unprocessable entity error code. |
InnerError
InnerError
Name | Type | Description |
---|---|---|
code |
DetailedErrorCode |
|
details |
object |
Additional supportive details regarding the error and/or expected policies. |
innerError |
InnerError |
|
message |
string |
High level error message. |
target |
string |
The source of the error. For example it would be "documents" or "document id" in case of invalid document. |
Status
Status
Value | Description |
---|---|
NotStarted |
The long running operation has not yet started. |
Running |
The long running operation is currently processing. |
Succeeded |
The long running operation has successfully completed. |
Failed |
The long running operation has failed. |
TextNormalizationKind
TextNormalizationKind
Value | Description |
---|---|
Default |
Default text normalization (e.g. '2 to 3' is replaced by 'two to three' in en-US). |
None |
No text normalization will be applied to the input text. This is an override option that should only be used when text is normalized before the upload. |