Text Independent - Create Enrollment

Enroll Profile
Adds an enrollment to existing profile. The first enrollment must be a predefined activation phrase which can be listed using the /phrases/{locale} api. If the minimum number of requested enrollment audios is reached, a voice print is created. Any further enrollment will be used to improve the voice print.

Limitations:

  • Minimum audio input length per request is 1 second

  • Maximum audio input length per request is 120 seconds

  • Minimum total effective speech length (excluding silence and other non-speech frames) for creating a voiceprint is 20 seconds This limitation can be disabled by setting "ignoreMinLength" to true.

  • Maximum total audio input length allowed for creating a voiceprint is 300 seconds

  • Minimum audio Signal-to-noise ratio (SNR) is 2dB

POST {endpoint}/speaker-recognition/verification/text-independent/profiles/{profileId}/enrollments?api-version=2021-09-05
POST {endpoint}/speaker-recognition/verification/text-independent/profiles/{profileId}/enrollments?api-version=2021-09-05&ignoreMinLength={ignoreMinLength}

URI Parameters

Name In Required Type Description
endpoint
path True

string

Supported Cognitive Services endpoints (protocol and hostname, for example: https://westus.api.cognitive.microsoft.com).

profileId
path True

string

uuid

Unique identifier for profile id (guid).

api-version
query True

string

Specifies the version of the operation to use for this request.

ignoreMinLength
query

boolean

If true, a voice print will be created immediately for this profile regardless of how much speech is supplied or stored. Default is false.

Request Header

Media Types: "audio/wav; codecs=audio/pcm"

Name Required Type Description
Ocp-Apim-Subscription-Key True

string

Request Body

Media Types: "audio/wav; codecs=audio/pcm"

Name Type Description
audioData

object

Binary audio file. Supported formats are audio/wav; codecs=audio/pcm. Supports audio up to 5MB.

Responses

Name Type Description
201 Created

TiEnrollmentInfo

Created

Other Status Codes

SpeakerErrorInfo

Failure

Headers

x-ms-error-code: string

Security

Ocp-Apim-Subscription-Key

Type: apiKey
In: header

Examples

Successful Query

Sample Request

POST https://westus.api.cognitive.microsoft.com/speaker-recognition/verification/text-independent/profiles/49a36324-fc4b-4387-aa06-090cfbf0064f/enrollments?api-version=2021-09-05


"{binary file date}"

Sample Response

Content-Type: application/json
{
  "profileId": "49a36324-fc4b-4387-aa06-090cfbf0064f",
  "enrollmentStatus": "Enrolling",
  "enrollmentsCount": 1,
  "enrollmentsLengthInSec": 1.83,
  "enrollmentsSpeechLengthInSec": 1.35,
  "remainingEnrollmentsSpeechLengthInSec": 18.65,
  "audioLengthInSec": 1.83,
  "audioSpeechLengthInSec": 1.35
}
Content-Type: application/json
x-ms-error-code: Error Code
{
  "error": {
    "code": "Error Code",
    "message": "Erro Messae"
  }
}

Definitions

Name Description
Error
SpeakerErrorInfo

Speaker error message

TiEnrollmentInfo

Text-Independent Speaker profile enrollment info

TrainingStatusType

Status representing the current state of the profile. Available values are:

  • Enrolling: profile has no voice print and not ready for recognition requests.
  • Training: voice print of profile is being created and can’t be used for recognition at the moment.
  • Enrolled: profile has a voice print and ready for recognition requests.

Error

Name Type Description
code

string

message

string

SpeakerErrorInfo

Speaker error message

Name Type Description
error

Error

TiEnrollmentInfo

Text-Independent Speaker profile enrollment info

Name Type Description
audioLengthInSec

number

This enrollment audio length in seconds.

audioSpeechLengthInSec

number

This enrollment audio pure speech (which is the amount of audio after removing silence and non-speech segments) length in seconds.

enrollmentStatus

TrainingStatusType

Status representing the current state of the profile. Available values are:

  • Enrolling: profile has no voice print and not ready for recognition requests.
  • Training: voice print of profile is being created and can’t be used for recognition at the moment.
  • Enrolled: profile has a voice print and ready for recognition requests.
enrollmentsCount

integer

Number of enrollment audios accepted for this profile.

enrollmentsLengthInSec

number

Total length of enrollment audios accepted for this profile in seconds.

enrollmentsSpeechLengthInSec

number

Summation of pure speech (which is the amount of audio after removing silence and non-speech segments) across all profile enrollments in seconds.

profileId

string

Unique identifier for profile id (guid).

remainingEnrollmentsSpeechLengthInSec

number

Amount of pure speech (which is the amount of audio after removing silence and non-speech segments) needed to complete profile enrollment in seconds.

TrainingStatusType

Status representing the current state of the profile. Available values are:

  • Enrolling: profile has no voice print and not ready for recognition requests.
  • Training: voice print of profile is being created and can’t be used for recognition at the moment.
  • Enrolled: profile has a voice print and ready for recognition requests.
Name Type Description
Enrolled

string

Enrolling

string

Training

string