Rediger

Del via


Get a speaker profile ID for the personal voice

To use personal voice in your application, you need to get a speaker profile ID. The speaker profile ID is used to generate synthesized audio with the text input provided.

You create a speaker profile ID based on the speaker's verbal consent statement and an audio prompt (a clean human voice sample between 5 - 90 seconds). The user's voice characteristics are encoded in the speakerProfileId property that's used for text to speech. For more information, see use personal voice in your application.

Note

The personal voice ID and speaker profile ID aren't same. You can choose the personal voice ID, but the speaker profile ID is generated by the service. The personal voice ID is used to manage the personal voice. The speaker profile ID is used for text to speech.

You provide the audio files from a publicly accessible URL (PersonalVoices_Create) or upload the audio files (PersonalVoices_Post).

Prompt audio format

The supported formats for prompt audio files are:

Format Sample rate Bit rate Bit depth
mp3 16 kHz, 24 kHz, 44.1 kHz, 48 kHz 128 kbps, 192 kbps, 256 kbps, 320 kbps /
wav 16 kHz, 24 kHz, 44.1 kHz, 48 kHz / 16-bit, 24-bit, 32-bit

Create personal voice from a file

In this scenario, the audio files must be available locally.

To create a personal voice and get the speaker profile ID, use the PersonalVoices_Post operation of the custom voice API. Construct the request body according to the following instructions:

  • Set the required projectId property. See create a project.
  • Set the required consentId property. See add user consent.
  • Set the required audiodata property. You can specify one or more audio files in the same request.

Make an HTTP POST request using the URI as shown in the following PersonalVoices_Post example.

  • Replace YourResourceKey with your Speech resource key.
  • Replace YourResourceRegion with your Speech resource region.
  • Replace JessicaPersonalVoiceId with a personal voice ID of your choice. The case sensitive ID will be used in the personal voice's URI and can't be changed later.
curl -v -X POST -H "Ocp-Apim-Subscription-Key: YourResourceKey" -F 'projectId="ProjectId"' -F 'consentId="JessicaConsentId"' -F 'audiodata=@"D:\PersonalVoiceTest\CNVSample001.wav"' -F 'audiodata=@"D:\PersonalVoiceTest\CNVSample002.wav"' "
https://YourResourceRegion.api.cognitive.microsoft.com/customvoice/personalvoices/JessicaPersonalVoiceId?api-version=2024-02-01-preview"

You should receive a response body in the following format:

{
  "id": "JessicaPersonalVoiceId",
  "speakerProfileId": "3059912f-a3dc-49e3-bdd0-02e449df1fe3",
  "projectId": "ProjectId",
  "consentId": "JessicaConsentId",
  "status": "NotStarted",
  "createdDateTime": "2024-09-01T05:30:00.000Z",
  "lastActionDateTime": "2024-09-02T10:15:30.000Z"
}

Use the speakerProfileId property to integrate personal voice in your text to speech application. For more information, see use personal voice in your application.

The response header contains the Operation-Location property. Use this URI to get details about the PersonalVoices_Post operation. Here's an example of the response header:

Operation-Location: https://eastus.api.cognitive.microsoft.com/customvoice/operations/1321a2c0-9be4-471d-83bb-bc3be4f96a6f?api-version=2024-02-01-preview
Operation-Id: 1321a2c0-9be4-471d-83bb-bc3be4f96a6f

Create personal voice from a URL

In this scenario, the audio files must already be stored in an Azure Blob Storage container.

To create a personal voice and get the speaker profile ID, use the PersonalVoices_Create operation of the custom voice API. Construct the request body according to the following instructions:

  • Set the required projectId property. See create a project.
  • Set the required consentId property. See add user consent.
  • Set the required audios property. Within the audios property, set the following properties:
    • Set the required containerUrl property to the URL of the Azure Blob Storage container that contains the audio files. Use shared access signatures (SAS) SAS for a container with both read and list permissions.
    • Set the required extensions property to the extensions of the audio files.
    • Optionally, set the prefix property to set a prefix for the blob name.

Make an HTTP PUT request using the URI as shown in the following PersonalVoices_Create example.

  • Replace YourResourceKey with your Speech resource key.
  • Replace YourResourceRegion with your Speech resource region.
  • Replace JessicaPersonalVoiceId with a personal voice ID of your choice. The case sensitive ID will be used in the personal voice's URI and can't be changed later.
curl -v -X PUT -H "Ocp-Apim-Subscription-Key: YourResourceKey" -H "Content-Type: application/json" -d '{
  "projectId": "ProjectId",
  "consentId": "JessicaConsentId",
  "audios": {
    "containerUrl": "https://contoso.blob.core.windows.net/voicecontainer?mySasToken",
    "prefix": "jessica/", 
    "extensions": [
      ".wav"
    ]
  }
} '  "https://YourResourceRegion.api.cognitive.microsoft.com/customvoice/personalvoices/JessicaPersonalVoiceId?api-version=2024-02-01-preview"

# Ensure the `containerUrl` has both read and list permissions. 
# Ensure the `.wav` files are located in the "jessica" folder within the container. The `prefix` matches all `.wav` files in the "jessica" folder. If there is no such folder, the prefix will match `.wav` files with names starting with "jessica". 

You should receive a response body in the following format:

{
  "id": "JessicaPersonalVoiceId",
  "speakerProfileId": "3059912f-a3dc-49e3-bdd0-02e449df1fe3",
  "projectId": "ProjectId",
  "consentId": "JessicaConsentId",
  "status": "NotStarted",
  "createdDateTime": "2024-09-01T05:30:00.000Z",
  "lastActionDateTime": "2024-09-02T10:15:30.000Z"
}

Use the speakerProfileId property to integrate personal voice in your text to speech application. For more information, see use personal voice in your application.

The response header contains the Operation-Location property. Use this URI to get details about the PersonalVoices_Create operation. Here's an example of the response header:

Operation-Location: https://eastus.api.cognitive.microsoft.com/customvoice/operations/1321a2c0-9be4-471d-83bb-bc3be4f96a6f?api-version=2024-02-01-preview
Operation-Id: 1321a2c0-9be4-471d-83bb-bc3be4f96a6f

Next steps