Get a speaker profile ID for the personal voice
To use personal voice in your application, you need to get a speaker profile ID. The speaker profile ID is used to generate synthesized audio with the text input provided.
You create a speaker profile ID based on the speaker's verbal consent statement and an audio prompt (a clean human voice sample between 5 - 90 seconds). The user's voice characteristics are encoded in the speakerProfileId
property that's used for text to speech. For more information, see use personal voice in your application.
Note
The personal voice ID and speaker profile ID aren't same. You can choose the personal voice ID, but the speaker profile ID is generated by the service. The personal voice ID is used to manage the personal voice. The speaker profile ID is used for text to speech.
You provide the audio files from a publicly accessible URL (PersonalVoices_Create) or upload the audio files (PersonalVoices_Post).
Prompt audio format
The supported formats for prompt audio files are:
Format | Sample rate | Bit rate | Bit depth |
---|---|---|---|
mp3 | 16 kHz, 24 kHz, 44.1 kHz, 48 kHz | 128 kbps, 192 kbps, 256 kbps, 320 kbps | / |
wav | 16 kHz, 24 kHz, 44.1 kHz, 48 kHz | / | 16-bit, 24-bit, 32-bit |
Create personal voice from a file
In this scenario, the audio files must be available locally.
To create a personal voice and get the speaker profile ID, use the PersonalVoices_Post operation of the custom voice API. Construct the request body according to the following instructions:
- Set the required
projectId
property. See create a project. - Set the required
consentId
property. See add user consent. - Set the required
audiodata
property. You can specify one or more audio files in the same request.
Make an HTTP POST request using the URI as shown in the following PersonalVoices_Post example.
- Replace
YourResourceKey
with your Speech resource key. - Replace
YourResourceRegion
with your Speech resource region. - Replace
JessicaPersonalVoiceId
with a personal voice ID of your choice. The case sensitive ID will be used in the personal voice's URI and can't be changed later.
curl -v -X POST -H "Ocp-Apim-Subscription-Key: YourResourceKey" -F 'projectId="ProjectId"' -F 'consentId="JessicaConsentId"' -F 'audiodata=@"D:\PersonalVoiceTest\CNVSample001.wav"' -F 'audiodata=@"D:\PersonalVoiceTest\CNVSample002.wav"' "
https://YourResourceRegion.api.cognitive.microsoft.com/customvoice/personalvoices/JessicaPersonalVoiceId?api-version=2024-02-01-preview"
You should receive a response body in the following format:
{
"id": "JessicaPersonalVoiceId",
"speakerProfileId": "3059912f-a3dc-49e3-bdd0-02e449df1fe3",
"projectId": "ProjectId",
"consentId": "JessicaConsentId",
"status": "NotStarted",
"createdDateTime": "2024-09-01T05:30:00.000Z",
"lastActionDateTime": "2024-09-02T10:15:30.000Z"
}
Use the speakerProfileId
property to integrate personal voice in your text to speech application. For more information, see use personal voice in your application.
The response header contains the Operation-Location
property. Use this URI to get details about the PersonalVoices_Post operation. Here's an example of the response header:
Operation-Location: https://eastus.api.cognitive.microsoft.com/customvoice/operations/1321a2c0-9be4-471d-83bb-bc3be4f96a6f?api-version=2024-02-01-preview
Operation-Id: 1321a2c0-9be4-471d-83bb-bc3be4f96a6f
Create personal voice from a URL
In this scenario, the audio files must already be stored in an Azure Blob Storage container.
To create a personal voice and get the speaker profile ID, use the PersonalVoices_Create operation of the custom voice API. Construct the request body according to the following instructions:
- Set the required
projectId
property. See create a project. - Set the required
consentId
property. See add user consent. - Set the required
audios
property. Within theaudios
property, set the following properties:- Set the required
containerUrl
property to the URL of the Azure Blob Storage container that contains the audio files. Use shared access signatures (SAS) SAS for a container with both read and list permissions. - Set the required
extensions
property to the extensions of the audio files. - Optionally, set the
prefix
property to set a prefix for the blob name.
- Set the required
Make an HTTP PUT request using the URI as shown in the following PersonalVoices_Create example.
- Replace
YourResourceKey
with your Speech resource key. - Replace
YourResourceRegion
with your Speech resource region. - Replace
JessicaPersonalVoiceId
with a personal voice ID of your choice. The case sensitive ID will be used in the personal voice's URI and can't be changed later.
curl -v -X PUT -H "Ocp-Apim-Subscription-Key: YourResourceKey" -H "Content-Type: application/json" -d '{
"projectId": "ProjectId",
"consentId": "JessicaConsentId",
"audios": {
"containerUrl": "https://contoso.blob.core.windows.net/voicecontainer?mySasToken",
"prefix": "jessica/",
"extensions": [
".wav"
]
}
} ' "https://YourResourceRegion.api.cognitive.microsoft.com/customvoice/personalvoices/JessicaPersonalVoiceId?api-version=2024-02-01-preview"
# Ensure the `containerUrl` has both read and list permissions.
# Ensure the `.wav` files are located in the "jessica" folder within the container. The `prefix` matches all `.wav` files in the "jessica" folder. If there is no such folder, the prefix will match `.wav` files with names starting with "jessica".
You should receive a response body in the following format:
{
"id": "JessicaPersonalVoiceId",
"speakerProfileId": "3059912f-a3dc-49e3-bdd0-02e449df1fe3",
"projectId": "ProjectId",
"consentId": "JessicaConsentId",
"status": "NotStarted",
"createdDateTime": "2024-09-01T05:30:00.000Z",
"lastActionDateTime": "2024-09-02T10:15:30.000Z"
}
Use the speakerProfileId
property to integrate personal voice in your text to speech application. For more information, see use personal voice in your application.
The response header contains the Operation-Location
property. Use this URI to get details about the PersonalVoices_Create operation. Here's an example of the response header:
Operation-Location: https://eastus.api.cognitive.microsoft.com/customvoice/operations/1321a2c0-9be4-471d-83bb-bc3be4f96a6f?api-version=2024-02-01-preview
Operation-Id: 1321a2c0-9be4-471d-83bb-bc3be4f96a6f