Text Independent - Create Enrollment

Reference

Service:: Speaker Recognition

API Version:: 2021-09-05

Enroll Profile
Adds an enrollment to existing profile. The first enrollment must be a predefined activation phrase which can be listed using the /phrases/{locale} api. If the minimum number of requested enrollment audios is reached, a voice print is created. Any further enrollment will be used to improve the voice print.

Limitations:

Minimum audio input length per request is 1 second
Maximum audio input length per request is 120 seconds
Minimum total effective speech length (excluding silence and other non-speech frames) for creating a voiceprint is 20 seconds This limitation can be disabled by setting ignoreMinLength to true.
Maximum total audio input length allowed for creating a voiceprint is 300 seconds
Minimum audio Signal-to-noise ratio (SNR) is 2dB

POST {endpoint}/speaker-recognition/identification/text-independent/profiles/{profileId}/enrollments?api-version=2021-09-05

With optional parameters:

POST {endpoint}/speaker-recognition/identification/text-independent/profiles/{profileId}/enrollments?api-version=2021-09-05&ignoreMinLength={ignoreMinLength}

URI Parameters

Name	In	Required	Type	Description
endpoint	path	True	string	Supported Cognitive Services endpoints (protocol and hostname, for example: https://westus.api.cognitive.microsoft.com).
profileId	path	True	string uuid	Unique identifier for profile id (guid).
api-version	query	True	string	Specifies the version of the operation to use for this request.
ignoreMinLength	query		boolean	If true, a voice print will be created immediately for this profile regardless of how much speech is supplied or stored. Default is false.

Request Header

Media Types: "audio/wav; codecs=audio/pcm"

Name	Required	Type	Description
Ocp-Apim-Subscription-Key	True	string

Request Body

Media Types: "audio/wav; codecs=audio/pcm"

Name	Type	Description
audioData	object	Binary audio file. Supported formats are audio/wav; codecs=audio/pcm. Supports audio up to 5MB.

Responses

Name	Type	Description
201 Created	TiEnrollmentInfo	Created
Other Status Codes	SpeakerErrorInfo	Failure Headers x-ms-error-code: string

Name

Type

Description

201 Created

TiEnrollmentInfo

Created

Other Status Codes

SpeakerErrorInfo

Failure

Headers

x-ms-error-code: string

Security

Ocp-Apim-Subscription-Key

Type: apiKey
In: header

Examples

Successful Query

Sample Request

HTTP

POST https://westus.api.cognitive.microsoft.com/speaker-recognition/identification/text-independent/profiles/49a36324-fc4b-4387-aa06-090cfbf0064f/enrollments?api-version=2021-09-05


"{binary file date}"

Sample Response

Status code:: 201

Content-Type: application/json

Response Body

{
  "profileId": "49a36324-fc4b-4387-aa06-090cfbf0064f",
  "enrollmentStatus": "Enrolling",
  "enrollmentsCount": 1,
  "enrollmentsLengthInSec": 1.83,
  "enrollmentsSpeechLengthInSec": 1.35,
  "remainingEnrollmentsSpeechLengthInSec": 18.65,
  "audioLengthInSec": 1.83,
  "audioSpeechLengthInSec": 1.35
}

Status code:: default

Content-Type: application/json
x-ms-error-code: Error Code

Response Body

{
  "error": {
    "code": "Error Code",
    "message": "Erro Messae"
  }
}

Definitions

Name	Description
Error
SpeakerErrorInfo	Speaker error message
TiEnrollmentInfo	Speaker profile enrollment info
TrainingStatusType	Status representing the current state of the profile enrollment. Available values are: Enrolling: profile has no voice print and not ready for recognition requests. Training: voice print of profile is being created and can’t be used for recognition at the moment. Enrolled: profile has a voice print and ready for recognition requests.

Error

Name	Type	Description
code	string
message	string

SpeakerErrorInfo

Speaker error message

Name	Type	Description
error	Error

TiEnrollmentInfo

Speaker profile enrollment info

Name	Type	Description
audioLengthInSec	number	This enrollment audio length in seconds.
audioSpeechLengthInSec	number	This enrollment audio pure speech (which is the amount of audio after removing silence and non-speech segments) length in seconds.
enrollmentStatus	TrainingStatusType	Status representing the current state of the profile enrollment. Available values are: Enrolling: profile has no voice print and not ready for recognition requests. Training: voice print of profile is being created and can’t be used for recognition at the moment. Enrolled: profile has a voice print and ready for recognition requests.
enrollmentsCount	integer	Number of enrollment audios accepted for this profile.
enrollmentsLengthInSec	number	Total length of enrollment audios accepted for this profile in seconds.
enrollmentsSpeechLengthInSec	number	Summation of pure speech (which is the amount of audio after removing silence and non-speech segments) across all profile enrollments in seconds.
profileId	string	Unique identifier for profile id (guid).
remainingEnrollmentsSpeechLengthInSec	number	Amount of pure speech (which is the amount of audio after removing silence and non-speech segments) needed to complete profile enrollment in seconds.

TrainingStatusType

Status representing the current state of the profile enrollment. Available values are:

Enrolling: profile has no voice print and not ready for recognition requests.
Training: voice print of profile is being created and can’t be used for recognition at the moment.
Enrolled: profile has a voice print and ready for recognition requests.

Name	Type	Description
Enrolled	string
Enrolling	string
Training	string

Text Independent - Create Enrollment

URI Parameters

Request Header

Request Body

Responses

Security

Ocp-Apim-Subscription-Key

Examples

Successful Query

Sample Request

Sample Response

Definitions

Error

SpeakerErrorInfo

TiEnrollmentInfo

TrainingStatusType

Additional resources