Text Dependent - Create Enrollment

Enroll Profile
Adds an enrollment to existing profile. If the minimum number of requested enrollment audios is reached, a voice print is created. If the voice print was created before, it gets recreated from all existing enrollment audios including the new one.

Limitations:

  • Minimum audio input length per request is 1 second
  • Maximum audio input length per request is 10 seconds
  • Minimum number of enrollments for creating a voiceprint is 3
  • Maximum number of enrollments for creating a voiceprint is 50
  • Minimum audio Signal-to-noise ratio (SNR) is 2dB

Constraints:

  • First enrollment must match an existing passphrase.
  • All enrollments after the first one, must use the same passphrase used in the first enrollment.
POST {endpoint}/speaker-recognition/verification/text-dependent/profiles/{profileId}/enrollments?api-version=2021-09-05

URI Parameters

Name In Required Type Description
endpoint
path True

string

Supported Cognitive Services endpoints (protocol and hostname, for example: https://westus.api.cognitive.microsoft.com).

profileId
path True

string

uuid

Unique identifier for profile id (guid).

api-version
query True

string

Specifies the version of the operation to use for this request.

Request Header

Media Types: "audio/wav; codecs=audio/pcm"

Name Required Type Description
Ocp-Apim-Subscription-Key True

string

Request Body

Media Types: "audio/wav; codecs=audio/pcm"

Name Type Description
audioData

object

Binary audio file. Supported formats are audio/wav; codecs=audio/pcm. Supports audio up to 5MB.

Responses

Name Type Description
201 Created

TdEnrollmentInfo

Created

Other Status Codes

SpeakerErrorInfo

Failure

Headers

x-ms-error-code: string

Security

Ocp-Apim-Subscription-Key

Type: apiKey
In: header

Examples

Successful Query

Sample Request

POST https://westus.api.cognitive.microsoft.com/speaker-recognition/verification/text-dependent/profiles/49a36324-fc4b-4387-aa06-090cfbf0064f/enrollments?api-version=2021-09-05


"{binary file date}"

Sample Response

Content-Type: application/json
{
  "profileId": "49a36324-fc4b-4387-aa06-090cfbf0064f",
  "enrollmentStatus": "Enrolling",
  "enrollmentsCount": 1,
  "enrollmentsLengthInSec": 1.83,
  "enrollmentsSpeechLengthInSec": 1.35,
  "remainingEnrollmentsCount": 2,
  "passPhrase": "my voice is my passport verify me",
  "audioLengthInSec": 1.83,
  "audioSpeechLengthInSec": 1.35
}
Content-Type: application/json
x-ms-error-code: Error Code
{
  "error": {
    "code": "Error Code",
    "message": "Erro Messae"
  }
}

Definitions

Name Description
Error
SpeakerErrorInfo

Speaker error message

TdEnrollmentInfo

Text-Dependent Speaker profile enrollment info

TrainingStatusType

Status representing the current state of the profile. Available values are:

  • Enrolling: profile has no voice print and not ready for recognition requests.
  • Training: voice print of profile is being created and can’t be used for recognition at the moment.
  • Enrolled: profile has a voice print and ready for recognition requests.

Error

Name Type Description
code

string

message

string

SpeakerErrorInfo

Speaker error message

Name Type Description
error

Error

TdEnrollmentInfo

Text-Dependent Speaker profile enrollment info

Name Type Description
audioLengthInSec

number

This enrollment audio length in seconds.

audioSpeechLengthInSec

number

This enrollment audio pure speech (which is the amount of audio after removing silence and non-speech segments) length in seconds.

enrollmentStatus

TrainingStatusType

Status representing the current state of the profile. Available values are:

  • Enrolling: profile has no voice print and not ready for recognition requests.
  • Training: voice print of profile is being created and can’t be used for recognition at the moment.
  • Enrolled: profile has a voice print and ready for recognition requests.
enrollmentsCount

integer

Number of enrollment audios accepted for this profile.

enrollmentsLengthInSec

number

Total length of enrollment audios accepted for this profile in seconds.

enrollmentsSpeechLengthInSec

number

Summation of pure speech (which is the amount of audio after removing silence and non-speech segments) across all profile enrollments in seconds.

passPhrase

string

Passphrase associated with this enrollment.

profileId

string

Unique identifier for profile id (guid).

remainingEnrollmentsCount

integer

Number of enrollment audios needed to complete profile enrollment.

TrainingStatusType

Status representing the current state of the profile. Available values are:

  • Enrolling: profile has no voice print and not ready for recognition requests.
  • Training: voice print of profile is being created and can’t be used for recognition at the moment.
  • Enrolled: profile has a voice print and ready for recognition requests.
Name Type Description
Enrolled

string

Enrolling

string

Training

string