Quickstart: Text to speech with the Azure OpenAI Service

Стаття
12/27/2024

In this quickstart, you use the Azure OpenAI Service for text to speech with OpenAI voices.

The available voices are: alloy, echo, fable, onyx, nova, and shimmer. For more information, see Azure OpenAI Service reference documentation for text to speech.

Prerequisites

An Azure subscription - Create one for free.
An Azure OpenAI resource created in the North Central US or Sweden Central regions with the tts-1 or tts-1-hd model deployed. For more information, see Create a resource and deploy a model with Azure OpenAI.

Set up

Retrieve key and endpoint

To successfully make a call against Azure OpenAI, you need an endpoint and a key.

Variable name	Value
`AZURE_OPENAI_ENDPOINT`	The service endpoint can be found in the Keys & Endpoint section when examining your resource from the Azure portal. Alternatively, you can find the endpoint via the Deployments page in Azure AI Foundry portal. An example endpoint is: `https://docs-test-001.openai.azure.com/`.
`AZURE_OPENAI_API_KEY`	This value can be found in the Keys & Endpoint section when examining your resource from the Azure portal. You can use either `KEY1` or `KEY2`.

Go to your resource in the Azure portal. The Endpoint and Keys can be found in the Resource Management section. Copy your endpoint and access key as you need both for authenticating your API calls. You can use either KEY1 or KEY2. Always having two keys allows you to securely rotate and regenerate keys without causing a service disruption.

Environment variables

Create and assign persistent environment variables for your key and endpoint.

Important

Use API keys with caution. Don't include the API key directly in your code, and never post it publicly. If you use an API key, store it securely in Azure Key Vault. For more information about using API keys securely in your apps, see API keys with Azure Key Vault.

For more information about AI services security, see Authenticate requests to Azure AI services.

setx AZURE_OPENAI_API_KEY "REPLACE_WITH_YOUR_KEY_VALUE_HERE"

setx AZURE_OPENAI_ENDPOINT "REPLACE_WITH_YOUR_ENDPOINT_HERE"

[System.Environment]::SetEnvironmentVariable('AZURE_OPENAI_API_KEY', 'REPLACE_WITH_YOUR_KEY_VALUE_HERE', 'User')

[System.Environment]::SetEnvironmentVariable('AZURE_OPENAI_ENDPOINT', 'REPLACE_WITH_YOUR_ENDPOINT_HERE', 'User')

echo export AZURE_OPENAI_API_KEY="REPLACE_WITH_YOUR_KEY_VALUE_HERE" >> /etc/environment && source /etc/environment

echo export AZURE_OPENAI_ENDPOINT="REPLACE_WITH_YOUR_ENDPOINT_HERE" >> /etc/environment && source /etc/environment

Create a REST request and response

In a bash shell, run the following command. You need to replace YourDeploymentName with the deployment name you chose when you deployed the text to speech model. The deployment name isn't necessarily the same as the model name. Entering the model name results in an error unless you chose a deployment name that is identical to the underlying model name.

curl $AZURE_OPENAI_ENDPOINT/openai/deployments/YourDeploymentName/audio/speech?api-version=2024-02-15-preview \
 -H "api-key: $AZURE_OPENAI_API_KEY" \
 -H "Content-Type: application/json" \
 -d '{
    "model": "tts-1-hd",
    "input": "I'm excited to try text to speech.",
    "voice": "alloy"
}' --output speech.mp3

The format of your first line of the command with an example endpoint would appear as follows curl https://aoai-docs.openai.azure.com/openai/deployments/{YourDeploymentName}/audio/speech?api-version=2024-02-15-preview \.

Important

For production, use a secure way of storing and accessing your credentials like Azure Key Vault. For more information about credential security, see the Azure AI services security article.

Source code | Package (npm) | Samples

Prerequisites

An Azure subscription - Create one for free
LTS versions of Node.js
Azure CLI used for passwordless authentication in a local development environment, create the necessary context by signing in with the Azure CLI.
An Azure OpenAI resource created in a supported region (see Region availability). For more information, see Create a resource and deploy a model with Azure OpenAI.

Microsoft Entra ID prerequisites

For the recommended keyless authentication with Microsoft Entra ID, you need to:

Install the Azure CLI used for keyless authentication with Microsoft Entra ID.
Assign the Cognitive Services User role to your user account. You can assign roles in the Azure portal under Access control (IAM) > Add role assignment.

Retrieve resource information

You need to retrieve the following information to authenticate your application with your Azure OpenAI resource:

Microsoft Entra ID
API key

Variable name	Value
`AZURE_OPENAI_ENDPOINT`	This value can be found in the Keys and Endpoint section when examining your resource from the Azure portal.
`AZURE_OPENAI_DEPLOYMENT_NAME`	This value will correspond to the custom name you chose for your deployment when you deployed a model. This value can be found under Resource Management > Model Deployments in the Azure portal.
`OPENAI_API_VERSION`	Learn more about API Versions.

Learn more about keyless authentication and setting environment variables.

Variable name	Value
`AZURE_OPENAI_ENDPOINT`	This value can be found in the Keys and Endpoint section when examining your resource from the Azure portal.
`AZURE_OPENAI_API_KEY`	This value can be found in the Keys and Endpoint section when examining your resource from the Azure portal. You can use either `KEY1` or `KEY2`.
`AZURE_OPENAI_DEPLOYMENT_NAME`	This value will correspond to the custom name you chose for your deployment when you deployed a model. This value can be found under Resource Management > Model Deployments in the Azure portal.
`OPENAI_API_VERSION`	Learn more about API Versions.

Learn more about finding API keys and setting environment variables.

Important

For more information about AI services security, see Authenticate requests to Azure AI services.

Caution

To use the recommended keyless authentication with the SDK, make sure that the AZURE_OPENAI_API_KEY environment variable isn't set.

Create a Node application

In a console window (such as cmd, PowerShell, or Bash), create a new directory for your app, and navigate to it. Then run the npm init command to create a node application with a package.json file.

npm init

Install the client library

Install the client libraries with:

npm install openai @azure/identity

Your app's package.json file will be updated with the dependencies.

Create a new file named Text-to-speech.js and open it in your preferred code editor. Copy the following code into the Text-to-speech.js file:

const { writeFile } = require("fs/promises");
const { AzureOpenAI } = require("openai");
const { DefaultAzureCredential, getBearerTokenProvider } = require("@azure/identity");
require("openai/shims/node");

// You will need to set these environment variables or edit the following values
const endpoint = process.env["AZURE_OPENAI_ENDPOINT"] || "<endpoint>";
const speechFilePath = "<path to save the speech file>";

// Required Azure OpenAI deployment name and API version
const deploymentName = "tts";
const apiVersion = "2024-08-01-preview";

// keyless authentication    
const credential = new DefaultAzureCredential();
const scope = "https://cognitiveservices.azure.com/.default";
const azureADTokenProvider = getBearerTokenProvider(credential, scope);

function getClient() {
  return new AzureOpenAI({
    endpoint,
    azureADTokenProvider,
    apiVersion,
    deployment: deploymentName,
  });
}

async function generateAudioStream(
  client,
  params
) {
  const response = await client.audio.speech.create(params);
  if (response.ok) return response.body;
  throw new Error(`Failed to generate audio stream: ${response.statusText}`);
}
export async function main() {
  console.log("== Text to Speech Sample ==");

  const client = getClient();
  const streamToRead = await generateAudioStream(client, {
    model: deploymentName,
    voice: "alloy",
    input: "the quick brown chicken jumped over the lazy dogs",
  });

  console.log(`Streaming response to ${speechFilePath}`);
  await writeFile(speechFilePath, streamToRead);
  console.log("Finished streaming");
}

main().catch((err) => {
  console.error("The sample encountered an error:", err);
});

Run the script with the following command:
```
node Text-to-speech.js
```

Create a new file named Text-to-speech.js and open it in your preferred code editor. Copy the following code into the Text-to-speech.js file:

const { writeFile } = require("fs/promises");
const { AzureOpenAI } = require("openai");
require("openai/shims/node");

// You will need to set these environment variables or edit the following values
const endpoint = process.env["AZURE_OPENAI_ENDPOINT"] || "<endpoint>";
const apiKey = process.env["AZURE_OPENAI_API_KEY"] || "<api key>";
const speechFilePath = "<path to save the speech file>";

// Required Azure OpenAI deployment name and API version
const deploymentName = "tts";
const apiVersion = "2024-08-01-preview";

function getClient() {
  return new AzureOpenAI({
    endpoint,
    apiKey,
    apiVersion,
    deployment: deploymentName,
  });
}

async function generateAudioStream(
  client,
  params
) {
  const response = await client.audio.speech.create(params);
  if (response.ok) return response.body;
  throw new Error(`Failed to generate audio stream: ${response.statusText}`);
}
export async function main() {
  console.log("== Text to Speech Sample ==");

  const client = getClient();
  const streamToRead = await generateAudioStream(client, {
    model: deploymentName,
    voice: "alloy",
    input: "the quick brown chicken jumped over the lazy dogs",
  });

  console.log(`Streaming response to ${speechFilePath}`);
  await writeFile(speechFilePath, streamToRead);
  console.log("Finished streaming");
}

main().catch((err) => {
  console.error("The sample encountered an error:", err);
});

Important

For more information about AI services security, see Authenticate requests to Azure AI services.

Run the script with the following command:
```
node Text-to-speech.js
```

Source code | Package (npm) | Samples

Prerequisites

An Azure subscription - Create one for free
LTS versions of Node.js
TypeScript
Azure CLI used for passwordless authentication in a local development environment, create the necessary context by signing in with the Azure CLI.
An Azure OpenAI resource created in a supported region (see Region availability). For more information, see Create a resource and deploy a model with Azure OpenAI.

Microsoft Entra ID prerequisites

For the recommended keyless authentication with Microsoft Entra ID, you need to:

Install the Azure CLI used for keyless authentication with Microsoft Entra ID.
Assign the Cognitive Services User role to your user account. You can assign roles in the Azure portal under Access control (IAM) > Add role assignment.

Retrieve resource information

You need to retrieve the following information to authenticate your application with your Azure OpenAI resource:

Microsoft Entra ID
API key

Variable name	Value
`AZURE_OPENAI_ENDPOINT`	This value can be found in the Keys and Endpoint section when examining your resource from the Azure portal.
`AZURE_OPENAI_DEPLOYMENT_NAME`	This value will correspond to the custom name you chose for your deployment when you deployed a model. This value can be found under Resource Management > Model Deployments in the Azure portal.
`OPENAI_API_VERSION`	Learn more about API Versions.

Learn more about keyless authentication and setting environment variables.

Variable name	Value
`AZURE_OPENAI_ENDPOINT`	This value can be found in the Keys and Endpoint section when examining your resource from the Azure portal.
`AZURE_OPENAI_API_KEY`	This value can be found in the Keys and Endpoint section when examining your resource from the Azure portal. You can use either `KEY1` or `KEY2`.
`AZURE_OPENAI_DEPLOYMENT_NAME`	This value will correspond to the custom name you chose for your deployment when you deployed a model. This value can be found under Resource Management > Model Deployments in the Azure portal.
`OPENAI_API_VERSION`	Learn more about API Versions.

Learn more about finding API keys and setting environment variables.

Important

For more information about AI services security, see Authenticate requests to Azure AI services.

Caution

To use the recommended keyless authentication with the SDK, make sure that the AZURE_OPENAI_API_KEY environment variable isn't set.

Create a Node application

npm init

Install the client library

Install the client libraries with:

npm install openai @azure/identity

Your app's package.json file will be updated with the dependencies.

Create a speech file

Microsoft Entra ID
API key

Create a new file named Text-to-speech.ts and open it in your preferred code editor. Copy the following code into the Text-to-speech.ts file:

import { writeFile } from "fs/promises";
import { AzureOpenAI } from "openai";
import { DefaultAzureCredential, getBearerTokenProvider } from "@azure/identity";
import type { SpeechCreateParams } from "openai/resources/audio/speech";
import "openai/shims/node";

// You will need to set these environment variables or edit the following values
const endpoint = process.env["AZURE_OPENAI_ENDPOINT"] || "<endpoint>";
const speechFilePath = "<path to save the speech file>";

// Required Azure OpenAI deployment name and API version
const deploymentName = "tts";
const apiVersion = "2024-08-01-preview";

// keyless authentication    
const credential = new DefaultAzureCredential();
const scope = "https://cognitiveservices.azure.com/.default";
const azureADTokenProvider = getBearerTokenProvider(credential, scope);

function getClient(): AzureOpenAI {
  return new AzureOpenAI({
    endpoint,
    azureADTokenProvider,
    apiVersion,
    deployment: deploymentName,
  });
}

async function generateAudioStream(
  client: AzureOpenAI,
  params: SpeechCreateParams
): Promise<NodeJS.ReadableStream> {
  const response = await client.audio.speech.create(params);
  if (response.ok) return response.body;
  throw new Error(`Failed to generate audio stream: ${response.statusText}`);
}
export async function main() {
  console.log("== Text to Speech Sample ==");

  const client = getClient();
  const streamToRead = await generateAudioStream(client, {
    model: deploymentName,
    voice: "alloy",
    input: "the quick brown chicken jumped over the lazy dogs",
  });

  console.log(`Streaming response to ${speechFilePath}`);
  await writeFile(speechFilePath, streamToRead);
  console.log("Finished streaming");
}

main().catch((err) => {
  console.error("The sample encountered an error:", err);
});

The import of "openai/shims/node" is necessary when running the code in a Node.js environment. It ensures that the output type of the client.audio.speech.create method is correctly set to NodeJS.ReadableStream.

Build the application with the following command:
```
tsc
```
Run the application with the following command:
```
node Text-to-speech.js
```

Create a new file named Text-to-speech.ts and open it in your preferred code editor. Copy the following code into the Text-to-speech.ts file:

import { writeFile } from "fs/promises";
import { AzureOpenAI } from "openai";
import type { SpeechCreateParams } from "openai/resources/audio/speech";
import "openai/shims/node";

// You will need to set these environment variables or edit the following values
const endpoint = "<endpoint>";
const apiKey = process.env["AZURE_OPENAI_API_KEY"] || "<api key>";
const speechFilePath =
  process.env["SPEECH_FILE_PATH"] || "<path to save the speech file>";

// Required Azure OpenAI deployment name and API version
const deploymentName = "tts";
const apiVersion = "2024-08-01-preview";

function getClient(): AzureOpenAI {
  return new AzureOpenAI({
    endpoint,
    apiKey,
    apiVersion,
    deployment: deploymentName,
  });
}

async function generateAudioStream(
  client: AzureOpenAI,
  params: SpeechCreateParams
): Promise<NodeJS.ReadableStream> {
  const response = await client.audio.speech.create(params);
  if (response.ok) return response.body;
  throw new Error(`Failed to generate audio stream: ${response.statusText}`);
}
export async function main() {
  console.log("== Text to Speech Sample ==");

  const client = getClient();
  const streamToRead = await generateAudioStream(client, {
    model: deploymentName,
    voice: "alloy",
    input: "the quick brown chicken jumped over the lazy dogs",
  });

  console.log(`Streaming response to ${speechFilePath}`);
  await writeFile(speechFilePath, streamToRead);
  console.log("Finished streaming");
}

main().catch((err) => {
  console.error("The sample encountered an error:", err);
});

Build the application with the following command:
```
tsc
```
Run the application with the following command:
```
node Text-to-speech.js
```

Important

For more information about AI services security, see Authenticate requests to Azure AI services.

Prerequisites

An Azure subscription. You can create one for free.
An Azure OpenAI resource with a Whisper model deployed in a supported region. For more information, see Create a resource and deploy a model with Azure OpenAI.
The .NET 8.0 SDK

Create the .NET app

Create a .NET app using the dotnet new command:
```
dotnet new console -n TextToSpeech
```
Change into the directory of the new app:
```
cd OpenAISpeech
```
Install the Azure.OpenAI client library:
```
dotnet add package Azure.AI.OpenAI
```

Authenticate and connect to Azure OpenAI

To make requests to your Azure OpenAI service, you need the service endpoint as well as authentication credentials via one of the following options:

Microsoft Entra ID is the recommended approach for authenticating to Azure services and is more secure than key-based alternatives.
Access keys allow you to provide a secret key to connect to your resource.

Important

For more information about AI services security, see Authenticate requests to Azure AI services.

Get the Azure OpenAI endpoint

The service endpoint can be found in the Keys & Endpoint section when examining your resource from the Azure portal. Alternatively, you can find the endpoint via the Deployments page in Azure AI Foundry portal. An example endpoint is: https://docs-test-001.openai.azure.com/.

Authenticate using Microsoft Entra ID

If you choose to use Microsoft Entra ID authentication, you'll need to complete the following:

Add the Azure.Identity package.
```
dotnet add package Azure.Identity
```
Assign the Cognitive Services User role to your user account. This can be done in the Azure portal on your OpenAI resource under Access control (IAM) > Add role assignment.
Sign-in to Azure using Visual Studio or the Azure CLI via az login.

Authenticate using keys

The access key value can be found in the Keys & Endpoint section when examining your resource from the Azure portal. You can use either KEY1 or KEY2. Always having two keys allows you to securely rotate and regenerate keys without causing a service disruption.

Update the app code

Replace the contents of program.cs with the following code and update the placeholder values with your own.

using Azure;
using Azure.AI.OpenAI;
using Azure.Identity; // Required for Passwordless auth

var endpoint = new Uri(
    Environment.GetEnvironmentVariable("YOUR_OPENAI_ENDPOINT") ?? throw new ArgumentNullException());
var credentials = new DefaultAzureCredential();

// Use this line for key auth
// var credentials = new AzureKeyCredential(
//    Environment.GetEnvironmentVariable("YOUR_OPENAI_KEY") ?? throw new ArgumentNullException());

var deploymentName = "tts"; // Default deployment name, update with your own if necessary
var speechFilePath = "YOUR_AUDIO_FILE_PATH";

var openAIClient = new AzureOpenAIClient(endpoint, credentials);
var audioClient = openAIClient.GetAudioClient(deploymentName);

var result = await audioClient.GenerateSpeechAsync(
                "the quick brown chicken jumped over the lazy dogs");

Console.WriteLine("Streaming response to ${speechFilePath}");
await File.WriteAllBytesAsync(speechFilePath, result.Value.ToArray());
Console.WriteLine("Finished streaming");

Run the application using the dotnet run command or the run button at the top of Visual Studio:
```
dotnet run
```
The app generates an audio file at the location you specified for the speechFilePath variable. Play the file on your device to hear the generated audio.

Clean up resources

If you want to clean up and remove an Azure OpenAI resource, you can delete the resource. Before deleting the resource, you must first delete any deployed models.

Next steps

Learn more about how to work with text to speech with Azure OpenAI Service in the Azure OpenAI Service reference documentation.
For more examples, check out the Azure OpenAI Samples GitHub repository

Поділитися через

Quickstart: Text to speech with the Azure OpenAI Service

Prerequisites

Set up

Retrieve key and endpoint

Environment variables

Create a REST request and response

Prerequisites

Microsoft Entra ID prerequisites

Retrieve resource information

Create a Node application

Install the client library

Create a speech file

Prerequisites

Microsoft Entra ID prerequisites

Retrieve resource information

Create a Node application

Install the client library

Create a speech file

Prerequisites

Create the .NET app

Authenticate and connect to Azure OpenAI

Get the Azure OpenAI endpoint

Authenticate using Microsoft Entra ID

Authenticate using keys

Update the app code

Clean up resources

Next steps

Зворотний зв’язок

Додаткові ресурси