合成语音

3 分钟

语音合成或 文本转语音是语音转文本的反向。它涉及到将文本提交到模型，该模型返回语音文本的音频流。

支持文本转语音作的模型包括：

gpt-4o-tts
gpt-4o-mini-tts

注释

模型可用性因区域而异。查看 Microsoft Foundry 文档中的 模型区域可用性 表。

使用文本转语音模型

与语音转文本模型类似，可以使用 OpenAI SDK 中的 AzureOpenAI 客户端连接到 Microsoft Foundry 资源的终结点，并将文本上传到文本转语音模型进行语音合成。

from openai import AzureOpenAI
from pathlib import Path

# Create an AzureOpenAI client
client = AzureOpenAI(
    azure_endpoint=YOUR_FOUNDRY_ENDPOINT,
    api_key=YOUR_FOUNDRY_KEY,
    api_version="2025-03-01-preview"
)

# Path for audio output file
speech_file_path = Path("output_speech.wav")

# Generate speech and save to file
with client.audio.speech.with_streaming_response.create(
            model=YOUR_MODEL_DEPLOYMENT,
            voice="alloy",
            input="This speech was AI-generated!",
            instructions="Speak in an upbeat, excited tone.",
    ) as response:
    response.stream_to_file(speech_file_path)

print(f"Speech generated and saved to {speech_file_path}")

反馈

此页面是否有帮助？