Hızlı Başlangıç: Azure OpenAI ses oluşturmayı kullanmaya başlama

2025-07-02

gpt-4o-audio-preview ve gpt-4o-mini-audio-preview modelleri, mevcut /chat/completions API'ye ses kalıcılığını tanıtır. Ses modeli, metin ve ses tabanlı etkileşimlerde ve ses analizinde yapay zeka uygulamalarının potansiyelini genişletir. ve gpt-4o-audio-preview modellerinde gpt-4o-mini-audio-preview desteklenen modaliteler şunlardır: metin, ses ve metin + ses.

Örnek kullanım örnekleriyle desteklenen modalitelerin bir tablosu aşağıda verilmiştir:

Kalıcılık girişi	Kalıcılık çıkışı	Örnek kullanım örneği
Metin	Metin + ses	Metin okuma, sesli kitap oluşturma
Ses	Metin + ses	Sesli transkripsiyon, sesli kitap oluşturma
Ses	Metin	Sesin metne dönüştürülmesi
Metin + ses	Metin + ses	Sesli kitap oluşturma
Metin + ses	Metin	Sesin metne dönüştürülmesi

Ses oluşturma özelliklerini kullanarak daha dinamik ve etkileşimli yapay zeka uygulamaları elde edebilirsiniz. Ses girişlerini ve çıkışlarını destekleyen modeller, istemlere yönelik sesli yanıtlar oluşturmanıza ve modele sorulmak için ses girişlerini kullanmanıza olanak tanır.

Desteklenen modeller

Şu anda yalnızca gpt-4o-audio-preview ve gpt-4o-mini-audio-preview sürümü: 2024-12-17 Ses oluşturmayı destekler.

Bölge kullanılabilirliği hakkında daha fazla bilgi için modeller ve sürümler belgelerine bakın.

Şu anda şu sesler ses çıkışı için desteklenmektedir: Alaşım, Yankı ve Shimmer.

Maksimum ses dosyası boyutu 20 MB'tır.

Uyarı

Gerçek Zamanlı API, tamamlama API'si ile aynı temel GPT-4o ses modelini kullanır, ancak gerçek zamanlı ses etkileşimleri için düşük gecikme süresi odağıyla optimize edilmiştir.

API desteği

Ses tamamlama desteği ilk olarak API sürümüne 2025-01-01-previeweklendi.

Ses oluşturma için model dağıtma

Modeli Azure AI Foundry portalında dağıtmak gpt-4o-mini-audio-preview için:

Azure AI Foundry portalına gidin ve projenizi oluşturun veya seçin.
Sol bölmedeki Varlıklarım'ın altından Modeller + uç noktalar'ı seçin.
Dağıtım penceresini açmak için + Model Dağıt>Temel Modeli Dağıt seçin.
Modeli arayıp seçin ve ardından Onayla'yı gpt-4o-mini-audio-previewseçin.
Dağıtım ayrıntılarını gözden geçirin ve Dağıt'ı seçin.
Modeli dağıtma işlemini tamamlamak için sihirbazı izleyin.

Artık modelin dağıtımına sahip olduğunuz için gpt-4o-mini-audio-preview Azure AI Foundry portalı Sohbet oyun alanı veya sohbet tamamlama api'sinde bu modelle etkileşim kurabilirsiniz.

GPT-4o ses oluşturmayı kullanma

Dağıtılan gpt-4o-mini-audio-preview modelinizle Azure AI Foundry portalınınSohbet oyun alanında sohbet etmek için şu adımları izleyin:

Azure AI Foundry portalına gidin ve dağıtılan gpt-4o-mini-audio-preview modelinizin bulunduğu projenizi seçin.
Azure AI Foundry'de projenize gidin.
Sol bölmeden Oyun Alanları'nı seçin.
Ses oyun alanı>Sohbet oyun alanı deneyin'i seçin.

Uyarı

Ses oyun alanı modeli desteklemezgpt-4o-mini-audio-preview. Bu bölümde açıklandığı gibi Sohbet oyun alanı'nı kullanın.
gpt-4o-mini-audio-preview açılan listesinden dağıtılan modelinizi seçin.
Modelle sohbet etmeye başlayın ve ses yanıtlarını dinleyin.

Şunları yapabilirsiniz:
- Ses istemlerini kaydedin.
- Sohbete ses dosyaları ekleyin.
- Metin istemleri girin.

Başvuru belgeleri | Kitaplık kaynak kodu | Paket (npm) | Örnekler

Örnek kullanım örnekleriyle desteklenen modalitelerin bir tablosu aşağıda verilmiştir:

Kalıcılık girişi	Kalıcılık çıkışı	Örnek kullanım örneği
Metin	Metin + ses	Metin okuma, sesli kitap oluşturma
Ses	Metin + ses	Sesli transkripsiyon, sesli kitap oluşturma
Ses	Metin	Sesin metne dönüştürülmesi
Metin + ses	Metin + ses	Sesli kitap oluşturma
Metin + ses	Metin	Sesin metne dönüştürülmesi

Desteklenen modeller

Şu anda yalnızca gpt-4o-audio-preview ve gpt-4o-mini-audio-preview sürümü: 2024-12-17 Ses oluşturmayı destekler.

Bölge kullanılabilirliği hakkında daha fazla bilgi için modeller ve sürümler belgelerine bakın.

Şu anda şu sesler ses çıkışı için desteklenmektedir: Alaşım, Yankı ve Shimmer.

Maksimum ses dosyası boyutu 20 MB'tır.

Uyarı

Gerçek Zamanlı API, tamamlama API'si ile aynı temel GPT-4o ses modelini kullanır, ancak gerçek zamanlı ses etkileşimleri için düşük gecikme süresi odağıyla optimize edilmiştir.

API desteği

Ses tamamlama desteği ilk olarak API sürümüne 2025-01-01-previeweklendi.

Önkoşullar

Azure aboneliği - Ücretsiz bir tane oluşturun
Node.js için LTS veya ESM desteği.
Desteklenen bölgelerden birinde oluşturulan bir Azure OpenAI kaynağı. Bölge kullanılabilirliği hakkında daha fazla bilgi için modeller ve sürümler belgelerine bakın.
Ardından Azure OpenAI kaynağınızla bir gpt-4o-mini-audio-preview model dağıtmanız gerekir. Daha fazla bilgi için bkz. Azure OpenAI ile kaynak oluşturma ve model dağıtma.

Microsoft Entra Id önkoşulları

Microsoft Entra Id ile önerilen anahtarsız kimlik doğrulaması için şunları yapmanız gerekir:

Microsoft Entra ID ile anahtarsız kimlik doğrulaması için kullanılan Azure CLI'yi yükleyin.
Rolü kullanıcı hesabınıza atayın Cognitive Services User . Azure portalında Erişim denetimi (IAM)>Rol ataması ekle altında rol atayabilirsiniz.

Kurulum

Yeni bir klasör audio-completions-quickstart oluşturun ve aşağıdaki komutu kullanarak hızlı başlangıç klasörüne gidin:
```
mkdir audio-completions-quickstart && cd audio-completions-quickstart
```
Aşağıdaki komutla package.json oluşturun:
```
npm init -y
```
JavaScript için OpenAI istemci kitaplığını şu şekilde yükleyin:
```
npm install openai
```
Microsoft Entra ID ile önerilen anahtarsız kimlik doğrulaması için paketi şu şekilde yükleyin @azure/identity :
```
npm install @azure/identity
```

Kaynak bilgilerini alma

Azure OpenAI kaynağınızla uygulamanızın kimliğini doğrulamak için aşağıdaki bilgileri almanız gerekir:

Microsoft Entra ID
API anahtarı

Değişken adı	Değer
`AZURE_OPENAI_ENDPOINT`	Bu değer, Kaynağınızı Azure portalından incelerken Anahtarlar ve Uç Nokta bölümünde bulunabilir.
`AZURE_OPENAI_DEPLOYMENT_NAME`	Dağıttığınız bir model için dağıtımınıza özel verdiğiniz isme bu değer karşılık gelir. Bu değer, Azure portalındaki Kaynak Yönetimi>Modeli Dağıtımları altında bulunabilir.
`OPENAI_API_VERSION`	API Sürümleri hakkında daha fazla bilgi edinin. Koddaki sürümü değiştirebilir veya bir ortam değişkeni kullanabilirsiniz.

Anahtarsız kimlik doğrulaması ve ortam değişkenlerini ayarlama hakkında daha fazla bilgi edinin.

Değişken adı	Değer
`AZURE_OPENAI_ENDPOINT`	Bu değer, Kaynağınızı Azure portalından incelerken Anahtarlar ve Uç Nokta bölümünde bulunabilir.
`AZURE_OPENAI_API_KEY`	Bu değer, Kaynağınızı Azure portalından incelerken Anahtarlar ve Uç Nokta bölümünde bulunabilir. `KEY1` veya `KEY2` kullanabilirsiniz.
`AZURE_OPENAI_DEPLOYMENT_NAME`	Dağıttığınız bir model için dağıtımınıza özel verdiğiniz isme bu değer karşılık gelir. Bu değer, Azure portalındaki Kaynak Yönetimi>Modeli Dağıtımları altında bulunabilir.
`OPENAI_API_VERSION`	API Sürümleri hakkında daha fazla bilgi edinin.

API anahtarlarını bulma ve ortam değişkenlerini ayarlama hakkında daha fazla bilgi edinin.

Önemli

API anahtarlarını dikkatli kullanın. API anahtarını doğrudan kodunuzla eklemeyin ve hiçbir zaman herkese açık olarak göndermeyin. API anahtarı kullanıyorsanız, bunu Azure Key Vault'ta güvenli bir şekilde depolayın. Uygulamalarınızda API anahtarlarını güvenli bir şekilde kullanma hakkında daha fazla bilgi için bkz. Azure Key Vault ile API anahtarları.

Yapay zeka hizmetleri güvenliği hakkında daha fazla bilgi için bkz. Azure AI hizmetlerine yönelik isteklerin kimliğini doğrulama.

Dikkat

SDK ile önerilen anahtarsız kimlik doğrulamasını kullanmak için ortam değişkeninin AZURE_OPENAI_API_KEY ayarlanmamış olduğundan emin olun.

to-audio.js Dosyayı aşağıdaki kodla oluşturun:

require("dotenv").config();
const { AzureOpenAI } = require("openai");
const { DefaultAzureCredential, getBearerTokenProvider } = require("@azure/identity");
const { writeFileSync } = require("node:fs");

// Keyless authentication    
const credential = new DefaultAzureCredential();
const scope = "https://cognitiveservices.azure.com/.default";
const azureADTokenProvider = getBearerTokenProvider(credential, scope);

// Set environment variables or edit the corresponding values here.
const endpoint = process.env.AZURE_OPENAI_ENDPOINT || "AZURE_OPENAI_ENDPOINT";
const deployment = process.env.AZURE_OPENAI_DEPLOYMENT_NAME || "gpt-4o-mini-audio-preview"; 
const apiVersion = process.env.OPENAI_API_VERSION || "2025-01-01-preview"; 

const client = new AzureOpenAI({ 
    endpoint, 
    azureADTokenProvider, 
    apiVersion, 
    deployment 
}); 

async function main() {

    // Make the audio chat completions request
    const response = await client.chat.completions.create({ 
        model: "gpt-4o-mini-audio-preview", 
        modalities: ["text", "audio"], 
        audio: { voice: "alloy", format: "wav" }, 
        messages: [ 
        { 
            role: "user", 
            content: "Is a golden retriever a good family dog?" 
        } 
        ] 
    }); 

// Inspect returned data 
console.log(response.choices[0]); 

// Write the output audio data to a file
writeFileSync( 
    "dog.wav", 
    Buffer.from(response.choices[0].message.audio.data, 'base64'), 
    { encoding: "utf-8" } 
); 
}

main().catch((err) => {
  console.error("Error occurred:", err);
});

module.exports = { main };

Aşağıdaki komutla Azure'da oturum açın:
```
az login
```
JavaScript dosyasını çalıştırın.
```
node to-audio.js
```

to-audio.js Dosyayı aşağıdaki kodla oluşturun:

require("dotenv").config();
const { AzureOpenAI } = require("openai");
const { writeFileSync } = require("node:fs");

// Set environment variables or edit the corresponding values here.
const endpoint = process.env.AZURE_OPENAI_ENDPOINT || "AZURE_OPENAI_ENDPOINT";
const apiKey = process.env.AZURE_OPENAI_API_KEY || "AZURE_OPENAI_API_KEY";
const apiVersion = "2025-01-01-preview"; 
const deployment = "gpt-4o-mini-audio-preview"; 

const client = new AzureOpenAI({ 
    endpoint, 
    apiKey, 
    apiVersion, 
    deployment 
});  

async function main() {

    // Make the audio chat completions request
    const response = await client.chat.completions.create({ 
        model: "gpt-4o-mini-audio-preview", 
        modalities: ["text", "audio"], 
        audio: { voice: "alloy", format: "wav" }, 
        messages: [ 
        { 
            role: "user", 
            content: "Is a golden retriever a good family dog?" 
        } 
        ] 
    }); 

// Inspect returned data 
console.log(response.choices[0]); 

// Write the output audio data to a file
writeFileSync( 
    "dog.wav", 
    Buffer.from(response.choices[0].message.audio.data, 'base64'), 
    { encoding: "utf-8" } 
); 
}

main().catch((err) => {
  console.error("Error occurred:", err);
});

module.exports = { main };

JavaScript dosyasını çalıştırın.
```
node to-audio.js
```

Yanıtı almak için birkaç dakika bekleyin.

Metin girişinden ses oluşturma çıktısı

Betik, betikle aynı dizinde dog.wav adlı bir ses dosyası oluşturur. Ses dosyası, "Altın renkli bir retriever iyi bir aile köpeği midir?" istemine verilen sesli yanıtı içerir.

Ses girişinden ses ve metin oluşturma

Microsoft Entra ID
API anahtarı

from-audio.js Dosyayı aşağıdaki kodla oluşturun:

require("dotenv").config();
const { AzureOpenAI } = require("openai");
const { DefaultAzureCredential, getBearerTokenProvider } = require("@azure/identity");
const fs = require('fs').promises;
const { writeFileSync } = require("node:fs");

// Keyless authentication    
const credential = new DefaultAzureCredential();
const scope = "https://cognitiveservices.azure.com/.default";
const azureADTokenProvider = getBearerTokenProvider(credential, scope);

// Set environment variables or edit the corresponding values here.
const endpoint = process.env.AZURE_OPENAI_ENDPOINT || "AZURE_OPENAI_ENDPOINT";
const apiVersion = "2025-01-01-preview"; 
const deployment = "gpt-4o-mini-audio-preview"; 

const client = new AzureOpenAI({ 
    endpoint, 
    azureADTokenProvider, 
    apiVersion, 
    deployment 
});    

async function main() {

    // Buffer the audio for input to the chat completion
    const wavBuffer = await fs.readFile("dog.wav"); 
    const base64str = Buffer.from(wavBuffer).toString("base64"); 

    // Make the audio chat completions request
    const response = await client.chat.completions.create({
        model: "gpt-4o-mini-audio-preview",
        modalities: ["text", "audio"],
        audio: { voice: "alloy", format: "wav" }, 
        messages: [
            {
                role: "user",
                content: [
                    { 
                        type: "text", 
                        text: "Describe in detail the spoken audio input." 
                    },
                    { 
                        type: "input_audio", 
                        input_audio: { 
                            data: base64str, 
                            format: "wav" 
                        } 
                    }
                ]
            }
        ]
    });

    console.log(response.choices[0]); 

    // Write the output audio data to a file
    writeFileSync( 
        "analysis.wav", 
        Buffer.from(response.choices[0].message.audio.data, 'base64'), 
        { encoding: "utf-8" } 
    ); 
}

main().catch((err) => {
    console.error("Error occurred:", err);
});

module.exports = { main };

Aşağıdaki komutla Azure'da oturum açın:
```
az login
```
JavaScript dosyasını çalıştırın.
```
node from-audio.js
```

from-audio.js Dosyayı aşağıdaki kodla oluşturun:

require("dotenv").config();
const { AzureOpenAI } = require("openai");
const fs = require('fs').promises;
const { writeFileSync } = require("node:fs");

// Set environment variables or edit the corresponding values here.
const endpoint = process.env.AZURE_OPENAI_ENDPOINT || "AZURE_OPENAI_ENDPOINT";
const apiKey = process.env.AZURE_OPENAI_API_KEY || "AZURE_OPENAI_API_KEY";
const apiVersion = "2025-01-01-preview"; 
const deployment = "gpt-4o-mini-audio-preview"; 

const client = new AzureOpenAI({ 
    endpoint, 
    apiKey, 
    apiVersion, 
    deployment 
});  

async function main() {

    // Buffer the audio for input to the chat completion
    const wavBuffer = await fs.readFile("dog.wav"); 
    const base64str = Buffer.from(wavBuffer).toString("base64"); 

    // Make the audio chat completions request
    const response = await client.chat.completions.create({
        model: "gpt-4o-mini-audio-preview",
        modalities: ["text", "audio"],
        audio: { voice: "alloy", format: "wav" }, 
        messages: [
            {
                role: "user",
                content: [
                    { 
                        type: "text", 
                        text: "Describe in detail the spoken audio input." 
                    },
                    { 
                        type: "input_audio", 
                        input_audio: { 
                            data: base64str, 
                            format: "wav" 
                        } 
                    }
                ]
            }
        ]
    });

    console.log(response.choices[0]); 

    // Write the output audio data to a file
    writeFileSync( 
        "analysis.wav", 
        Buffer.from(response.choices[0].message.audio.data, 'base64'), 
        { encoding: "utf-8" } 
    ); 
}

main().catch((err) => {
    console.error("Error occurred:", err);
});

module.exports = { main };

JavaScript dosyasını çalıştırın.
```
node from-audio.js
```

Yanıtı almak için birkaç dakika bekleyin.

Ses girişinden ses ve metin oluşturma çıkışı

Betik, konuşulan ses girişinin özetini oluşturur. Ayrıca betikle aynı dizinde analysis.wav adlı bir ses dosyası oluşturur. Ses dosyası, istemin sesli yanıtını içerir.

Ses oluşturma ve çok aşamalı sohbet tamamlamalarını kullanma

Microsoft Entra ID
API anahtarı

multi-turn.js Dosyayı aşağıdaki kodla oluşturun:

require("dotenv").config();
const { AzureOpenAI } = require("openai");
const { DefaultAzureCredential, getBearerTokenProvider } = require("@azure/identity");
const fs = require('fs').promises;

// Keyless authentication    
const credential = new DefaultAzureCredential();
const scope = "https://cognitiveservices.azure.com/.default";
const azureADTokenProvider = getBearerTokenProvider(credential, scope);

// Set environment variables or edit the corresponding values here.
const endpoint = process.env.AZURE_OPENAI_ENDPOINT || "AZURE_OPENAI_ENDPOINT";
const apiVersion = "2025-01-01-preview"; 
const deployment = "gpt-4o-mini-audio-preview"; 

const client = new AzureOpenAI({ 
    endpoint, 
    azureADTokenProvider, 
    apiVersion, 
    deployment 
}); 

async function main() {

    // Buffer the audio for input to the chat completion
    const wavBuffer = await fs.readFile("dog.wav"); 
    const base64str = Buffer.from(wavBuffer).toString("base64"); 

    // Initialize messages with the first turn's user input 
    const messages = [
        {
            role: "user",
            content: [
                { 
                    type: "text", 
                    text: "Describe in detail the spoken audio input." 
                },
                { 
                    type: "input_audio", 
                    input_audio: { 
                        data: base64str, 
                        format: "wav" 
                    } 
                }
            ]
        }
    ];

    // Get the first turn's response 

    const response = await client.chat.completions.create({ 
        model: "gpt-4o-mini-audio-preview",
        modalities: ["text", "audio"], 
        audio: { voice: "alloy", format: "wav" }, 
        messages: messages
    }); 

    console.log(response.choices[0]); 

    // Add a history message referencing the previous turn's audio by ID 
    messages.push({ 
        role: "assistant", 
        audio: { id: response.choices[0].message.audio.id }
    });

    // Add a new user message for the second turn
    messages.push({ 
        role: "user", 
        content: [ 
            { 
                type: "text", 
                text: "Very concisely summarize the favorability." 
            } 
        ] 
    }); 

    // Send the follow-up request with the accumulated messages
    const followResponse = await client.chat.completions.create({ 
        model: "gpt-4o-mini-audio-preview",
        messages: messages
    });

    console.log(followResponse.choices[0].message.content); 
}

main().catch((err) => {
    console.error("Error occurred:", err);
});

module.exports = { main };

Aşağıdaki komutla Azure'da oturum açın:
```
az login
```
JavaScript dosyasını çalıştırın.
```
node multi-turn.js
```

multi-turn.js Dosyayı aşağıdaki kodla oluşturun:

require("dotenv").config();
const { AzureOpenAI } = require("openai");
const fs = require('fs').promises;

// Set environment variables or edit the corresponding values here.
const endpoint = process.env.AZURE_OPENAI_ENDPOINT || "AZURE_OPENAI_ENDPOINT";
const apiKey = process.env.AZURE_OPENAI_API_KEY || "AZURE_OPENAI_API_KEY";
const apiVersion = "2025-01-01-preview"; 
const deployment = "gpt-4o-mini-audio-preview"; 

const client = new AzureOpenAI({ 
    endpoint, 
    apiKey, 
    apiVersion, 
    deployment 
});  

async function main() {

    // Buffer the audio for input to the chat completion
    const wavBuffer = await fs.readFile("dog.wav"); 
    const base64str = Buffer.from(wavBuffer).toString("base64"); 

    // Initialize messages with the first turn's user input 
    const messages = [
        {
            role: "user",
            content: [
                { 
                    type: "text", 
                    text: "Describe in detail the spoken audio input." 
                },
                { 
                    type: "input_audio", 
                    input_audio: { 
                        data: base64str, 
                        format: "wav" 
                    } 
                }
            ]
        }
    ];

    // Get the first turn's response 

    const response = await client.chat.completions.create({ 
        model: "gpt-4o-mini-audio-preview",
        modalities: ["text", "audio"], 
        audio: { voice: "alloy", format: "wav" }, 
        messages: messages
    }); 

    console.log(response.choices[0]); 

    // Add a history message referencing the previous turn's audio by ID 
    messages.push({ 
        role: "assistant", 
        audio: { id: response.choices[0].message.audio.id }
    });

    // Add a new user message for the second turn
    messages.push({ 
        role: "user", 
        content: [ 
            { 
                type: "text", 
                text: "Very concisely summarize the favorability." 
            } 
        ] 
    }); 

    // Send the follow-up request with the accumulated messages
    const followResponse = await client.chat.completions.create({ 
        model: "gpt-4o-mini-audio-preview",
        messages: messages
    });

    console.log(followResponse.choices[0].message.content); 
}

main().catch((err) => {
    console.error("Error occurred:", err);
});

module.exports = { main };

JavaScript dosyasını çalıştırın.
```
node multi-turn.js
```

Yanıtı almak için birkaç dakika bekleyin.

Çok aşamalı sohbet tamamlamaları için çıkış

Betik, konuşulan ses girişinin özetini oluşturur. Ardından, konuşulan ses girişini kısaca özetlemek için çok aşamalı bir sohbet tamamlama işlemi yapar.

Kitaplık kaynak kodu | Paket | Örnekler

Örnek kullanım örnekleriyle desteklenen modalitelerin bir tablosu aşağıda verilmiştir:

Kalıcılık girişi	Kalıcılık çıkışı	Örnek kullanım örneği
Metin	Metin + ses	Metin okuma, sesli kitap oluşturma
Ses	Metin + ses	Sesli transkripsiyon, sesli kitap oluşturma
Ses	Metin	Sesin metne dönüştürülmesi
Metin + ses	Metin + ses	Sesli kitap oluşturma
Metin + ses	Metin	Sesin metne dönüştürülmesi

Desteklenen modeller

Şu anda yalnızca gpt-4o-audio-preview ve gpt-4o-mini-audio-preview sürümü: 2024-12-17 Ses oluşturmayı destekler.

Bölge kullanılabilirliği hakkında daha fazla bilgi için modeller ve sürümler belgelerine bakın.

Şu anda şu sesler ses çıkışı için desteklenmektedir: Alaşım, Yankı ve Shimmer.

Maksimum ses dosyası boyutu 20 MB'tır.

Uyarı

Gerçek Zamanlı API, tamamlama API'si ile aynı temel GPT-4o ses modelini kullanır, ancak gerçek zamanlı ses etkileşimleri için düşük gecikme süresi odağıyla optimize edilmiştir.

API desteği

Ses tamamlama desteği ilk olarak API sürümüne 2025-01-01-previeweklendi.

Python için Azure OpenAI SDK'sı ile ses oluşturmaya başlamak için bu kılavuzu kullanın.

Önkoşullar

Bir Azure aboneliği. Ücretsiz bir tane oluşturun.
Python 3.8 veya sonraki bir sürümü. Python 3.10 veya üzerini kullanmanızı öneririz, ancak en az Python 3.8'e sahip olmak gerekir. Python'ın uygun bir sürümü yüklü değilse, işletim sisteminize Python yüklemenin en kolay yolu için VS Code Python Öğreticisi'ndeki yönergeleri izleyebilirsiniz.
Desteklenen bölgelerden birinde oluşturulan bir Azure OpenAI kaynağı. Bölge kullanılabilirliği hakkında daha fazla bilgi için modeller ve sürümler belgelerine bakın.
Ardından Azure OpenAI kaynağınızla bir gpt-4o-mini-audio-preview model dağıtmanız gerekir. Daha fazla bilgi için bkz. Azure OpenAI ile kaynak oluşturma ve model dağıtma.

Microsoft Entra Id önkoşulları

Microsoft Entra Id ile önerilen anahtarsız kimlik doğrulaması için şunları yapmanız gerekir:

Microsoft Entra ID ile anahtarsız kimlik doğrulaması için kullanılan Azure CLI'yi yükleyin.
Rolü kullanıcı hesabınıza atayın Cognitive Services User . Azure portalında Erişim denetimi (IAM)>Rol ataması ekle altında rol atayabilirsiniz.

Kurulum

Yeni bir klasör audio-completions-quickstart oluşturun ve aşağıdaki komutu kullanarak hızlı başlangıç klasörüne gidin:
```
mkdir audio-completions-quickstart && cd audio-completions-quickstart
```
Sanal ortam oluşturma. Python 3.10 veya üzeri yüklüyse aşağıdaki komutları kullanarak bir sanal ortam oluşturabilirsiniz:
- Windows
- Linux
- macOS
```
py -3 -m venv .venv
.venv\scripts\activate
```
```
python3 -m venv .venv
source .venv/bin/activate
```
```
python3 -m venv .venv
source .venv/bin/activate
```
Python ortamını etkinleştirmek, komut satırını çalıştırdığınızda veya python komut satırından çalıştırdığınızda pip uygulamanızın klasöründe bulunan Python yorumlayıcısını .venv kullanacağınız anlamına gelir. komutunu kullanarak deactivate Python sanal ortamından çıkabilirsiniz ve daha sonra gerektiğinde yeniden etkinleştirebilirsiniz.

Tavsiye

Bu öğretici için ihtiyacınız olan paketleri yüklemek üzere kullanmak üzere yeni bir Python ortamı oluşturmanızı ve etkinleştirmenizi öneririz. Paketleri genel Python yüklemenize yüklemeyin. Python paketlerini yüklerken her zaman bir sanal veya conda ortamı kullanmanız gerekir, aksi takdirde Python'ın genel yüklemesini bozabilirsiniz.
Python için OpenAI istemci kitaplığını şu şekilde yükleyin:
```
pip install openai
```
Microsoft Entra ID ile önerilen anahtarsız kimlik doğrulaması için paketi şu şekilde yükleyin azure-identity :
```
pip install azure-identity
```

Kaynak bilgilerini alma

Azure OpenAI kaynağınızla uygulamanızın kimliğini doğrulamak için aşağıdaki bilgileri almanız gerekir:

Microsoft Entra ID
API anahtarı

Değişken adı	Değer
`AZURE_OPENAI_ENDPOINT`	Bu değer, Kaynağınızı Azure portalından incelerken Anahtarlar ve Uç Nokta bölümünde bulunabilir.
`AZURE_OPENAI_DEPLOYMENT_NAME`	Dağıttığınız bir model için dağıtımınıza özel verdiğiniz isme bu değer karşılık gelir. Bu değer, Azure portalındaki Kaynak Yönetimi>Modeli Dağıtımları altında bulunabilir.
`OPENAI_API_VERSION`	API Sürümleri hakkında daha fazla bilgi edinin. Koddaki sürümü değiştirebilir veya bir ortam değişkeni kullanabilirsiniz.

Anahtarsız kimlik doğrulaması ve ortam değişkenlerini ayarlama hakkında daha fazla bilgi edinin.

Değişken adı	Değer
`AZURE_OPENAI_ENDPOINT`	Bu değer, Kaynağınızı Azure portalından incelerken Anahtarlar ve Uç Nokta bölümünde bulunabilir.
`AZURE_OPENAI_API_KEY`	Bu değer, Kaynağınızı Azure portalından incelerken Anahtarlar ve Uç Nokta bölümünde bulunabilir. `KEY1` veya `KEY2` kullanabilirsiniz.
`AZURE_OPENAI_DEPLOYMENT_NAME`	Dağıttığınız bir model için dağıtımınıza özel verdiğiniz isme bu değer karşılık gelir. Bu değer, Azure portalındaki Kaynak Yönetimi>Modeli Dağıtımları altında bulunabilir.
`OPENAI_API_VERSION`	API Sürümleri hakkında daha fazla bilgi edinin.

API anahtarlarını bulma ve ortam değişkenlerini ayarlama hakkında daha fazla bilgi edinin.

Önemli

Yapay zeka hizmetleri güvenliği hakkında daha fazla bilgi için bkz. Azure AI hizmetlerine yönelik isteklerin kimliğini doğrulama.

Metin girişinden ses oluşturma

Microsoft Entra ID
API anahtarı

to-audio.py Dosyayı aşağıdaki kodla oluşturun:

import requests
import base64 
import os 
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider=get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")

# Set environment variables or edit the corresponding values here.
endpoint = os.environ['AZURE_OPENAI_ENDPOINT']

# Keyless authentication
client=AzureOpenAI(
    azure_ad_token_provider=token_provider,
    azure_endpoint=endpoint,
    api_version="2025-01-01-preview"
)

# Make the audio chat completions request
completion=client.chat.completions.create(
    model="gpt-4o-mini-audio-preview",
    modalities=["text", "audio"],
    audio={"voice": "alloy", "format": "wav"},
    messages=[
        {
            "role": "user",
            "content": "Is a golden retriever a good family dog?"
        }
    ]
)

print(completion.choices[0])

# Write the output audio data to a file
wav_bytes=base64.b64decode(completion.choices[0].message.audio.data)
with open("dog.wav", "wb") as f:
    f.write(wav_bytes)

Python dosyasını çalıştırın.
```
python to-audio.py
```

to-audio.py Dosyayı aşağıdaki kodla oluşturun:

import base64 
import os 
from openai import AzureOpenAI 

# Set environment variables or edit the corresponding values here.
endpoint = os.environ['AZURE_OPENAI_ENDPOINT']
api_key = os.environ['AZURE_OPENAI_API_KEY']

client = AzureOpenAI(
    api_version="2025-01-01-preview",  
    api_key=api_key,
    azure_endpoint=endpoint
)

# Make the audio chat completions request
completion = client.chat.completions.create(
    model="gpt-4o-mini-audio-preview",
    modalities=["text", "audio"],
    audio={"voice": "alloy", "format": "wav"},
    messages=[
        {
            "role": "user",
            "content": "Is a golden retriever a good family dog?"
        }
    ]
)

print(completion.choices[0])

# Write the output audio data to a file
wav_bytes = base64.b64decode(completion.choices[0].message.audio.data)
with open("dog.wav", "wb") as f:
    f.write(wav_bytes)

Python dosyasını çalıştırın.
```
python to-audio.py
```

Yanıtı almak için birkaç dakika bekleyin.

Metin girişinden ses oluşturma çıktısı

Betik, betikle aynı dizinde dog.wav adlı bir ses dosyası oluşturur. Ses dosyası, "Altın renkli bir retriever iyi bir aile köpeği midir?" istemine verilen sesli yanıtı içerir.

Ses girişinden ses ve metin oluşturma

Microsoft Entra ID
API anahtarı

from-audio.py Dosyayı aşağıdaki kodla oluşturun:

import base64
import os
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider=get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")

# Set environment variables or edit the corresponding values here.
endpoint = os.environ['AZURE_OPENAI_ENDPOINT']

# Keyless authentication
client=AzureOpenAI(
    azure_ad_token_provider=token_provider,
    azure_endpoint=endpoint,
    api_version="2025-01-01-preview"
)

# Read and encode audio file  
with open('dog.wav', 'rb') as wav_reader: 
    encoded_string = base64.b64encode(wav_reader.read()).decode('utf-8') 

# Make the audio chat completions request
completion = client.chat.completions.create( 
    model="gpt-4o-mini-audio-preview", 
    modalities=["text", "audio"], 
    audio={"voice": "alloy", "format": "wav"}, 
    messages=[ 
        { 
            "role": "user", 
            "content": [ 
                {  
                    "type": "text", 
                    "text": "Describe in detail the spoken audio input." 
                }, 
                { 
                    "type": "input_audio", 
                    "input_audio": { 
                        "data": encoded_string, 
                        "format": "wav" 
                    } 
                } 
            ] 
        }, 
    ] 
) 

print(completion.choices[0].message.audio.transcript)

# Write the output audio data to a file
wav_bytes = base64.b64decode(completion.choices[0].message.audio.data)
with open("analysis.wav", "wb") as f:
    f.write(wav_bytes)

Python dosyasını çalıştırın.
```
python from-audio.py
```

from-audio.py Dosyayı aşağıdaki kodla oluşturun:

import base64
import os
from openai import AzureOpenAI

# Set environment variables or edit the corresponding values here.
endpoint = os.environ['AZURE_OPENAI_ENDPOINT']
api_key = os.environ['AZURE_OPENAI_API_KEY']

client = AzureOpenAI(
    api_version="2025-01-01-preview",  
    api_key=api_key, 
    azure_endpoint=endpoint,
)

# Read and encode audio file  
with open('dog.wav', 'rb') as wav_reader: 
    encoded_string = base64.b64encode(wav_reader.read()).decode('utf-8') 

# Make the audio chat completions request
completion = client.chat.completions.create( 
    model="gpt-4o-mini-audio-preview", 
    modalities=["text", "audio"], 
    audio={"voice": "alloy", "format": "wav"}, 
    messages=[ 
        { 
            "role": "user", 
            "content": [ 
                {  
                    "type": "text", 
                    "text": "Describe in detail the spoken audio input." 
                }, 
                { 
                    "type": "input_audio", 
                    "input_audio": { 
                        "data": encoded_string, 
                        "format": "wav" 
                    } 
                } 
            ] 
        }, 
    ] 
) 

print(completion.choices[0].message.audio.transcript)

# Write the output audio data to a file
wav_bytes = base64.b64decode(completion.choices[0].message.audio.data)
with open("analysis.wav", "wb") as f:
    f.write(wav_bytes)

Python dosyasını çalıştırın.
```
python from-audio.py
```

Yanıtı almak için birkaç dakika bekleyin.

Ses girişinden ses ve metin oluşturma çıkışı

Betik, konuşulan ses girişinin özetini oluşturur. Ayrıca betikle aynı dizinde analysis.wav adlı bir ses dosyası oluşturur. Ses dosyası, istemin sesli yanıtını içerir.

Ses oluşturma ve çok aşamalı sohbet tamamlamalarını kullanma

Microsoft Entra ID
API anahtarı

multi-turn.py Dosyayı aşağıdaki kodla oluşturun:

import base64 
import os 
from openai import AzureOpenAI 
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider=get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")

# Set environment variables or edit the corresponding values here.
endpoint = os.environ['AZURE_OPENAI_ENDPOINT']

# Keyless authentication
client=AzureOpenAI(
    azure_ad_token_provider=token_provider,
    azure_endpoint=endpoint,
    api_version="2025-01-01-preview"
)

# Read and encode audio file  
with open('dog.wav', 'rb') as wav_reader: 
    encoded_string = base64.b64encode(wav_reader.read()).decode('utf-8') 

# Initialize messages with the first turn's user input 
messages = [
    { 
        "role": "user", 
        "content": [ 
            { "type": "text", "text": "Describe in detail the spoken audio input." }, 
            { "type": "input_audio", 
                "input_audio": { 
                    "data": encoded_string, 
                    "format": "wav" 
                } 
            } 
        ] 
    }] 

# Get the first turn's response

completion = client.chat.completions.create( 
    model="gpt-4o-mini-audio-preview", 
    modalities=["text", "audio"], 
    audio={"voice": "alloy", "format": "wav"}, 
    messages=messages
) 

print("Get the first turn's response:")
print(completion.choices[0].message.audio.transcript) 

print("Add a history message referencing the first turn's audio by ID:")
print(completion.choices[0].message.audio.id)

# Add a history message referencing the first turn's audio by ID 
messages.append({ 
    "role": "assistant", 
    "audio": { "id": completion.choices[0].message.audio.id } 
}) 

# Add the next turn's user message 
messages.append({ 
    "role": "user", 
    "content": "Very briefly, summarize the favorability." 
}) 

# Send the follow-up request with the accumulated messages
completion = client.chat.completions.create( 
    model="gpt-4o-mini-audio-preview", 
    messages=messages
) 

print("Very briefly, summarize the favorability.")
print(completion.choices[0].message.content)

Python dosyasını çalıştırın.
```
python multi-turn.py
```

multi-turn.py Dosyayı aşağıdaki kodla oluşturun:

import base64 
import os 
from openai import AzureOpenAI 

# Set environment variables or edit the corresponding values here.
endpoint = os.environ['AZURE_OPENAI_ENDPOINT']
api_key = os.environ['AZURE_OPENAI_API_KEY']

client = AzureOpenAI(
    api_version="2025-01-01-preview",  
    api_key=api_key, 
    azure_endpoint=endpoint
)

# Read and encode audio file  
with open('dog.wav', 'rb') as wav_reader: 
    encoded_string = base64.b64encode(wav_reader.read()).decode('utf-8') 

# Initialize messages with the first turn's user input 
messages = [
    { 
        "role": "user", 
        "content": [ 
            { "type": "text", "text": "Describe in detail the spoken audio input." }, 
            { "type": "input_audio", 
                "input_audio": { 
                    "data": encoded_string, 
                    "format": "wav" 
                } 
            } 
        ] 
    }] 

# Get the first turn's response 

completion = client.chat.completions.create( 
    model="gpt-4o-mini-audio-preview", 
    modalities=["text", "audio"], 
    audio={"voice": "alloy", "format": "wav"}, 
    messages=messages
) 

print("Get the first turn's response:")
print(completion.choices[0].message.audio.transcript) 

print("Add a history message referencing the first turn's audio by ID:")
print(completion.choices[0].message.audio.id)

# Add a history message referencing the first turn's audio by ID 
messages.append({ 
    "role": "assistant", 
    "audio": { "id": completion.choices[0].message.audio.id } 
}) 

# Add the next turn's user message 
messages.append({ 
    "role": "user", 
    "content": "Very briefly, summarize the favorability." 
}) 

# Send the follow-up request with the accumulated messages 
completion = client.chat.completions.create( 
    model="gpt-4o-mini-audio-preview", 
    messages=messages
) 

print("Very briefly, summarize the favorability.")
print(completion.choices[0].message.content)

Python dosyasını çalıştırın.
```
python multi-turn.py
```

Yanıtı almak için birkaç dakika bekleyin.

Çok aşamalı sohbet tamamlamaları için çıkış

Betik, konuşulan ses girişinin özetini oluşturur. Ardından, konuşulan ses girişini kısaca özetlemek için çok aşamalı bir sohbet tamamlama işlemi yapar.

REST API Belirtimi |

Örnek kullanım örnekleriyle desteklenen modalitelerin bir tablosu aşağıda verilmiştir:

Kalıcılık girişi	Kalıcılık çıkışı	Örnek kullanım örneği
Metin	Metin + ses	Metin okuma, sesli kitap oluşturma
Ses	Metin + ses	Sesli transkripsiyon, sesli kitap oluşturma
Ses	Metin	Sesin metne dönüştürülmesi
Metin + ses	Metin + ses	Sesli kitap oluşturma
Metin + ses	Metin	Sesin metne dönüştürülmesi

Desteklenen modeller

Şu anda yalnızca gpt-4o-audio-preview ve gpt-4o-mini-audio-preview sürümü: 2024-12-17 Ses oluşturmayı destekler.

Bölge kullanılabilirliği hakkında daha fazla bilgi için modeller ve sürümler belgelerine bakın.

Şu anda şu sesler ses çıkışı için desteklenmektedir: Alaşım, Yankı ve Shimmer.

Maksimum ses dosyası boyutu 20 MB'tır.

Uyarı

Gerçek Zamanlı API, tamamlama API'si ile aynı temel GPT-4o ses modelini kullanır, ancak gerçek zamanlı ses etkileşimleri için düşük gecikme süresi odağıyla optimize edilmiştir.

API desteği

Ses tamamlama desteği ilk olarak API sürümüne 2025-01-01-previeweklendi.

Önkoşullar

Bir Azure aboneliği. Ücretsiz bir tane oluşturun.
Python 3.8 veya sonraki bir sürümü. Python 3.10 veya üzerini kullanmanızı öneririz, ancak en az Python 3.8'e sahip olmak gerekir. Python'ın uygun bir sürümü yüklü değilse, işletim sisteminize Python yüklemenin en kolay yolu için VS Code Python Öğreticisi'ndeki yönergeleri izleyebilirsiniz.
Desteklenen bölgelerden birinde oluşturulan bir Azure OpenAI kaynağı. Bölge kullanılabilirliği hakkında daha fazla bilgi için modeller ve sürümler belgelerine bakın.
Ardından Azure OpenAI kaynağınızla bir gpt-4o-mini-audio-preview model dağıtmanız gerekir. Daha fazla bilgi için bkz. Azure OpenAI ile kaynak oluşturma ve model dağıtma.

Microsoft Entra Id önkoşulları

Microsoft Entra Id ile önerilen anahtarsız kimlik doğrulaması için şunları yapmanız gerekir:

Microsoft Entra ID ile anahtarsız kimlik doğrulaması için kullanılan Azure CLI'yi yükleyin.
Rolü kullanıcı hesabınıza atayın Cognitive Services User . Azure portalında Erişim denetimi (IAM)>Rol ataması ekle altında rol atayabilirsiniz.

Kurulum

Yeni bir klasör audio-completions-quickstart oluşturun ve aşağıdaki komutu kullanarak hızlı başlangıç klasörüne gidin:
```
mkdir audio-completions-quickstart && cd audio-completions-quickstart
```
Sanal ortam oluşturma. Python 3.10 veya üzeri yüklüyse aşağıdaki komutları kullanarak bir sanal ortam oluşturabilirsiniz:
- Windows
- Linux
- macOS
```
py -3 -m venv .venv
.venv\scripts\activate
```
```
python3 -m venv .venv
source .venv/bin/activate
```
```
python3 -m venv .venv
source .venv/bin/activate
```
Python ortamını etkinleştirmek, komut satırını çalıştırdığınızda veya python komut satırından çalıştırdığınızda pip uygulamanızın klasöründe bulunan Python yorumlayıcısını .venv kullanacağınız anlamına gelir. komutunu kullanarak deactivate Python sanal ortamından çıkabilirsiniz ve daha sonra gerektiğinde yeniden etkinleştirebilirsiniz.

Tavsiye

Bu öğretici için ihtiyacınız olan paketleri yüklemek üzere kullanmak üzere yeni bir Python ortamı oluşturmanızı ve etkinleştirmenizi öneririz. Paketleri genel Python yüklemenize yüklemeyin. Python paketlerini yüklerken her zaman bir sanal veya conda ortamı kullanmanız gerekir, aksi takdirde Python'ın genel yüklemesini bozabilirsiniz.
Python için OpenAI istemci kitaplığını şu şekilde yükleyin:
```
pip install openai
```
Microsoft Entra ID ile önerilen anahtarsız kimlik doğrulaması için paketi şu şekilde yükleyin azure-identity :
```
pip install azure-identity
```

Kaynak bilgilerini alma

Azure OpenAI kaynağınızla uygulamanızın kimliğini doğrulamak için aşağıdaki bilgileri almanız gerekir:

Microsoft Entra ID
API anahtarı

Değişken adı	Değer
`AZURE_OPENAI_ENDPOINT`	Bu değer, Kaynağınızı Azure portalından incelerken Anahtarlar ve Uç Nokta bölümünde bulunabilir.
`AZURE_OPENAI_DEPLOYMENT_NAME`	Dağıttığınız bir model için dağıtımınıza özel verdiğiniz isme bu değer karşılık gelir. Bu değer, Azure portalındaki Kaynak Yönetimi>Modeli Dağıtımları altında bulunabilir.
`OPENAI_API_VERSION`	API Sürümleri hakkında daha fazla bilgi edinin. Koddaki sürümü değiştirebilir veya bir ortam değişkeni kullanabilirsiniz.

Anahtarsız kimlik doğrulaması ve ortam değişkenlerini ayarlama hakkında daha fazla bilgi edinin.

Değişken adı	Değer
`AZURE_OPENAI_ENDPOINT`	Bu değer, Kaynağınızı Azure portalından incelerken Anahtarlar ve Uç Nokta bölümünde bulunabilir.
`AZURE_OPENAI_API_KEY`	Bu değer, Kaynağınızı Azure portalından incelerken Anahtarlar ve Uç Nokta bölümünde bulunabilir. `KEY1` veya `KEY2` kullanabilirsiniz.
`AZURE_OPENAI_DEPLOYMENT_NAME`	Dağıttığınız bir model için dağıtımınıza özel verdiğiniz isme bu değer karşılık gelir. Bu değer, Azure portalındaki Kaynak Yönetimi>Modeli Dağıtımları altında bulunabilir.
`OPENAI_API_VERSION`	API Sürümleri hakkında daha fazla bilgi edinin.

API anahtarlarını bulma ve ortam değişkenlerini ayarlama hakkında daha fazla bilgi edinin.

Önemli

Yapay zeka hizmetleri güvenliği hakkında daha fazla bilgi için bkz. Azure AI hizmetlerine yönelik isteklerin kimliğini doğrulama.

Metin girişinden ses oluşturma

Microsoft Entra ID
API anahtarı

to-audio.py Dosyayı aşağıdaki kodla oluşturun:

import requests
import base64 
import os 
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential

# Set environment variables or edit the corresponding values here.
endpoint = os.environ['AZURE_OPENAI_ENDPOINT']

# Keyless authentication
credential = DefaultAzureCredential()
token = credential.get_token("https://cognitiveservices.azure.com/.default")

api_version = '2025-01-01-preview'
url = f"{endpoint}/openai/deployments/gpt-4o-mini-audio-preview/chat/completions?api-version={api_version}"
headers= { "Authorization": f"Bearer {token.token}", "Content-Type": "application/json" }
body = {
  "modalities": ["audio", "text"],
  "model": "gpt-4o-mini-audio-preview",
  "audio": {
      "format": "wav",
      "voice": "alloy"
  },
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Is a golden retriever a good family dog?"
        }
      ]
    }
  ]
}

# Make the audio chat completions request
completion = requests.post(url, headers=headers, json=body)
audio_data = completion.json()['choices'][0]['message']['audio']['data']

# Write the output audio data to a file
wav_bytes = base64.b64decode(audio_data)
with open("dog.wav", "wb") as f: 
  f.write(wav_bytes)

Python dosyasını çalıştırın.
```
python to-audio.py
```

to-audio.py Dosyayı aşağıdaki kodla oluşturun:

import requests
import base64 
import os 
from openai import AzureOpenAI 

# Set environment variables or edit the corresponding values here.
endpoint = os.environ['AZURE_OPENAI_ENDPOINT']
api_key = os.environ['AZURE_OPENAI_API_KEY']

api_version = '2025-01-01-preview'
url = f"{endpoint}/openai/deployments/gpt-4o-mini-audio-preview/chat/completions?api-version={api_version}"
headers= { "api-key": api_key, "Content-Type": "application/json" }
body = {
  "modalities": ["audio", "text"],
  "model": "gpt-4o-mini-audio-preview",
  "audio": {
      "format": "wav",
      "voice": "alloy"
  },
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Is a golden retriever a good family dog?"
        }
      ]
    }
  ]
}

# Make the audio chat completions request
completion = requests.post(url, headers=headers, json=body)
audio_data = completion.json()['choices'][0]['message']['audio']['data']

# Write the output audio data to a file 
wav_bytes = base64.b64decode(audio_data)
with open("dog.wav", "wb") as f: 
  f.write(wav_bytes)

Python dosyasını çalıştırın.
```
python to-audio.py
```

Yanıtı almak için birkaç dakika bekleyin.

Metin girişinden ses oluşturma çıktısı

Betik, betikle aynı dizinde dog.wav adlı bir ses dosyası oluşturur. Ses dosyası, "Altın renkli bir retriever iyi bir aile köpeği midir?" istemine verilen sesli yanıtı içerir.

Ses girişinden ses ve metin oluşturma

Microsoft Entra ID
API anahtarı

from-audio.py Dosyayı aşağıdaki kodla oluşturun:

import requests
import base64
import os
from azure.identity import DefaultAzureCredential

# Set environment variables or edit the corresponding values here.
endpoint = os.environ['AZURE_OPENAI_ENDPOINT']

# Keyless authentication
credential = DefaultAzureCredential()
token = credential.get_token("https://cognitiveservices.azure.com/.default")

# Read and encode audio file  
with open('dog.wav', 'rb') as wav_reader: 
  encoded_string = base64.b64encode(wav_reader.read()).decode('utf-8') 

api_version = '2025-01-01-preview'
url = f"{endpoint}/openai/deployments/gpt-4o-mini-audio-preview/chat/completions?api-version={api_version}"
headers= { "Authorization": f"Bearer {token.token}", "Content-Type": "application/json" }
body = {
  "modalities": ["audio", "text"],
  "model": "gpt-4o-mini-audio-preview",
  "audio": {
      "format": "wav",
      "voice": "alloy"
  },
  "messages": [
    { 
        "role": "user", 
        "content": [ 
            {  
                "type": "text", 
                "text": "Describe in detail the spoken audio input." 
            }, 
            { 
                "type": "input_audio", 
                "input_audio": { 
                    "data": encoded_string, 
                    "format": "wav" 
                } 
            } 
        ] 
    }, 
  ]
}

completion = requests.post(url, headers=headers, json=body)

print(completion.json()['choices'][0]['message']['audio']['transcript'])

# Write the output audio data to a file
audio_data = completion.json()['choices'][0]['message']['audio']['data'] 
wav_bytes = base64.b64decode(audio_data)
with open("analysis.wav", "wb") as f: 
  f.write(wav_bytes)

Python dosyasını çalıştırın.
```
python from-audio.py
```

from-audio.py Dosyayı aşağıdaki kodla oluşturun:

import requests
import base64
import os

# Set environment variables or edit the corresponding values here.
endpoint = os.environ['AZURE_OPENAI_ENDPOINT']
api_key = os.environ['AZURE_OPENAI_API_KEY']

# Read and encode audio file  
with open('dog.wav', 'rb') as wav_reader: 
  encoded_string = base64.b64encode(wav_reader.read()).decode('utf-8') 

api_version = '2025-01-01-preview'
url = f"{endpoint}/openai/deployments/gpt-4o-mini-audio-preview/chat/completions?api-version={api_version}"
headers= { "api-key": api_key, "Content-Type": "application/json" }
body = {
  "modalities": ["audio", "text"],
  "model": "gpt-4o-mini-audio-preview",
  "audio": {
      "format": "wav",
      "voice": "alloy"
  },
  "messages": [
    { 
        "role": "user", 
        "content": [ 
            {  
                "type": "text", 
                "text": "Describe in detail the spoken audio input." 
            }, 
            { 
                "type": "input_audio", 
                "input_audio": { 
                    "data": encoded_string, 
                    "format": "wav" 
                } 
            } 
        ] 
    }, 
  ]
}

completion = requests.post(url, headers=headers, json=body)

print(completion.json()['choices'][0]['message']['audio']['transcript'])

# Write the output audio data to a file
audio_data = completion.json()['choices'][0]['message']['audio']['data'] 
wav_bytes = base64.b64decode(audio_data)
with open("analysis.wav", "wb") as f: 
  f.write(wav_bytes)

Python dosyasını çalıştırın.
```
python from-audio.py
```

Yanıtı almak için birkaç dakika bekleyin.

Ses girişinden ses ve metin oluşturma çıkışı

Betik, konuşulan ses girişinin özetini oluşturur. Ayrıca betikle aynı dizinde analysis.wav adlı bir ses dosyası oluşturur. Ses dosyası, istemin sesli yanıtını içerir.

Ses oluşturma ve çok aşamalı sohbet tamamlamalarını kullanma

Microsoft Entra ID
API anahtarı

multi-turn.py Dosyayı aşağıdaki kodla oluşturun:

import requests
import base64 
import os 
from openai import AzureOpenAI 
from azure.identity import DefaultAzureCredential

# Set environment variables or edit the corresponding values here.
endpoint = os.environ['AZURE_OPENAI_ENDPOINT']

# Keyless authentication
credential = DefaultAzureCredential()
token = credential.get_token("https://cognitiveservices.azure.com/.default")

api_version = '2025-01-01-preview'
url = f"{endpoint}/openai/deployments/gpt-4o-mini-audio-preview/chat/completions?api-version={api_version}"
headers= { "Authorization": f"Bearer {token.token}", "Content-Type": "application/json" }

# Read and encode audio file  
with open('dog.wav', 'rb') as wav_reader: 
  encoded_string = base64.b64encode(wav_reader.read()).decode('utf-8') 

# Initialize messages with the first turn's user input 
messages = [
    { 
        "role": "user", 
        "content": [ 
            {  
                "type": "text", 
                "text": "Describe in detail the spoken audio input." 
            }, 
            { 
                "type": "input_audio", 
                "input_audio": { 
                    "data": encoded_string, 
                    "format": "wav" 
                } 
            } 
        ] 
    }] 

body = {
  "modalities": ["audio", "text"],
  "model": "gpt-4o-mini-audio-preview",
  "audio": {
      "format": "wav",
      "voice": "alloy"
  },
  "messages": messages
}

# Get the first turn's response, including generated audio 
completion = requests.post(url, headers=headers, json=body)

print("Get the first turn's response:")
print(completion.json()['choices'][0]['message']['audio']['transcript']) 

print("Add a history message referencing the first turn's audio by ID:")
print(completion.json()['choices'][0]['message']['audio']['id'])

# Add a history message referencing the first turn's audio by ID 
messages.append({ 
    "role": "assistant", 
    "audio": { "id": completion.json()['choices'][0]['message']['audio']['id'] } 
}) 

# Add the next turn's user message 
messages.append({ 
    "role": "user", 
    "content": "Very briefly, summarize the favorability." 
}) 

body = {
  "model": "gpt-4o-mini-audio-preview",
  "messages": messages
}

# Send the follow-up request with the accumulated messages
completion = requests.post(url, headers=headers, json=body) 

print("Very briefly, summarize the favorability.")
print(completion.json()['choices'][0]['message']['content'])

Python dosyasını çalıştırın.
```
python multi-turn.py
```

multi-turn.py Dosyayı aşağıdaki kodla oluşturun:

import requests
import base64 
import os 
from openai import AzureOpenAI 

# Set environment variables or edit the corresponding values here.
endpoint = os.environ['AZURE_OPENAI_ENDPOINT']
api_key = os.environ['AZURE_OPENAI_API_KEY']

api_version = '2025-01-01-preview'
url = f"{endpoint}/openai/deployments/gpt-4o-mini-audio-preview/chat/completions?api-version={api_version}"
headers= { "api-key": api_key, "Content-Type": "application/json" }

# Read and encode audio file  
with open('dog.wav', 'rb') as wav_reader: 
  encoded_string = base64.b64encode(wav_reader.read()).decode('utf-8') 

# Initialize messages with the first turn's user input 
messages = [
    { 
        "role": "user", 
        "content": [ 
            {  
                "type": "text", 
                "text": "Describe in detail the spoken audio input." 
            }, 
            { 
                "type": "input_audio", 
                "input_audio": { 
                    "data": encoded_string, 
                    "format": "wav" 
                } 
            } 
        ] 
    }] 

body = {
  "modalities": ["audio", "text"],
  "model": "gpt-4o-mini-audio-preview",
  "audio": {
      "format": "wav",
      "voice": "alloy"
  },
  "messages": messages
}


# Get the first turn's response, including generated audio 
completion = requests.post(url, headers=headers, json=body)

print("Get the first turn's response:")
print(completion.json()['choices'][0]['message']['audio']['transcript']) 

print("Add a history message referencing the first turn's audio by ID:")
print(completion.json()['choices'][0]['message']['audio']['id'])

# Add a history message referencing the first turn's audio by ID 
messages.append({ 
    "role": "assistant", 
    "audio": { "id": completion.json()['choices'][0]['message']['audio']['id'] } 
}) 

# Add the next turn's user message 
messages.append({ 
    "role": "user", 
    "content": "Very briefly, summarize the favorability." 
}) 

body = {
  "model": "gpt-4o-mini-audio-preview",
  "messages": messages
}

# Send the follow-up request with the accumulated messages
completion = requests.post(url, headers=headers, json=body) 

print("Very briefly, summarize the favorability.")
print(completion.json()['choices'][0]['message']['content'])

Python dosyasını çalıştırın.
```
python multi-turn.py
```

Yanıtı almak için birkaç dakika bekleyin.

Çok aşamalı sohbet tamamlamaları için çıkış

Betik, konuşulan ses girişinin özetini oluşturur. Ardından, konuşulan ses girişini kısaca özetlemek için çok aşamalı bir sohbet tamamlama işlemi yapar.

Başvuru belgeleri | Kitaplık kaynak kodu | Paket (npm) | Örnekler

Örnek kullanım örnekleriyle desteklenen modalitelerin bir tablosu aşağıda verilmiştir:

Kalıcılık girişi	Kalıcılık çıkışı	Örnek kullanım örneği
Metin	Metin + ses	Metin okuma, sesli kitap oluşturma
Ses	Metin + ses	Sesli transkripsiyon, sesli kitap oluşturma
Ses	Metin	Sesin metne dönüştürülmesi
Metin + ses	Metin + ses	Sesli kitap oluşturma
Metin + ses	Metin	Sesin metne dönüştürülmesi

Desteklenen modeller

Şu anda yalnızca gpt-4o-audio-preview ve gpt-4o-mini-audio-preview sürümü: 2024-12-17 Ses oluşturmayı destekler.

Bölge kullanılabilirliği hakkında daha fazla bilgi için modeller ve sürümler belgelerine bakın.

Şu anda şu sesler ses çıkışı için desteklenmektedir: Alaşım, Yankı ve Shimmer.

Maksimum ses dosyası boyutu 20 MB'tır.

Uyarı

Gerçek Zamanlı API, tamamlama API'si ile aynı temel GPT-4o ses modelini kullanır, ancak gerçek zamanlı ses etkileşimleri için düşük gecikme süresi odağıyla optimize edilmiştir.

API desteği

Ses tamamlama desteği ilk olarak API sürümüne 2025-01-01-previeweklendi.

Önkoşullar

Azure aboneliği - Ücretsiz bir tane oluşturun
Node.js için LTS veya ESM desteği.
TypeScript genel olarak yüklendi.
Desteklenen bölgelerden birinde oluşturulan bir Azure OpenAI kaynağı. Bölge kullanılabilirliği hakkında daha fazla bilgi için modeller ve sürümler belgelerine bakın.
Ardından Azure OpenAI kaynağınızla bir gpt-4o-mini-audio-preview model dağıtmanız gerekir. Daha fazla bilgi için bkz. Azure OpenAI ile kaynak oluşturma ve model dağıtma.

Microsoft Entra Id önkoşulları

Microsoft Entra Id ile önerilen anahtarsız kimlik doğrulaması için şunları yapmanız gerekir:

Microsoft Entra ID ile anahtarsız kimlik doğrulaması için kullanılan Azure CLI'yi yükleyin.
Rolü kullanıcı hesabınıza atayın Cognitive Services User . Azure portalında Erişim denetimi (IAM)>Rol ataması ekle altında rol atayabilirsiniz.

Kurulum

Yeni bir klasör audio-completions-quickstart oluşturun ve aşağıdaki komutu kullanarak hızlı başlangıç klasörüne gidin:
```
mkdir audio-completions-quickstart && cd audio-completions-quickstart
```
Aşağıdaki komutla package.json oluşturun:
```
npm init -y
```
package.json aşağıdaki komut ile ECMAScript olarak güncelleyin:
```
npm pkg set type=module
```
JavaScript için OpenAI istemci kitaplığını şu şekilde yükleyin:
```
npm install openai
```
Microsoft Entra ID ile önerilen anahtarsız kimlik doğrulaması için paketi şu şekilde yükleyin @azure/identity :
```
npm install @azure/identity
```

Kaynak bilgilerini alma

Azure OpenAI kaynağınızla uygulamanızın kimliğini doğrulamak için aşağıdaki bilgileri almanız gerekir:

Microsoft Entra ID
API anahtarı

Değişken adı	Değer
`AZURE_OPENAI_ENDPOINT`	Bu değer, Kaynağınızı Azure portalından incelerken Anahtarlar ve Uç Nokta bölümünde bulunabilir.
`AZURE_OPENAI_DEPLOYMENT_NAME`	Dağıttığınız bir model için dağıtımınıza özel verdiğiniz isme bu değer karşılık gelir. Bu değer, Azure portalındaki Kaynak Yönetimi>Modeli Dağıtımları altında bulunabilir.
`OPENAI_API_VERSION`	API Sürümleri hakkında daha fazla bilgi edinin. Koddaki sürümü değiştirebilir veya bir ortam değişkeni kullanabilirsiniz.

Anahtarsız kimlik doğrulaması ve ortam değişkenlerini ayarlama hakkında daha fazla bilgi edinin.

Değişken adı	Değer
`AZURE_OPENAI_ENDPOINT`	Bu değer, Kaynağınızı Azure portalından incelerken Anahtarlar ve Uç Nokta bölümünde bulunabilir.
`AZURE_OPENAI_API_KEY`	Bu değer, Kaynağınızı Azure portalından incelerken Anahtarlar ve Uç Nokta bölümünde bulunabilir. `KEY1` veya `KEY2` kullanabilirsiniz.
`AZURE_OPENAI_DEPLOYMENT_NAME`	Dağıttığınız bir model için dağıtımınıza özel verdiğiniz isme bu değer karşılık gelir. Bu değer, Azure portalındaki Kaynak Yönetimi>Modeli Dağıtımları altında bulunabilir.
`OPENAI_API_VERSION`	API Sürümleri hakkında daha fazla bilgi edinin.

API anahtarlarını bulma ve ortam değişkenlerini ayarlama hakkında daha fazla bilgi edinin.

Önemli

Yapay zeka hizmetleri güvenliği hakkında daha fazla bilgi için bkz. Azure AI hizmetlerine yönelik isteklerin kimliğini doğrulama.

Dikkat

SDK ile önerilen anahtarsız kimlik doğrulamasını kullanmak için ortam değişkeninin AZURE_OPENAI_API_KEY ayarlanmamış olduğundan emin olun.

Metin girişinden ses oluşturma

Microsoft Entra ID
API anahtarı

to-audio.ts Dosyayı aşağıdaki kodla oluşturun:

import { writeFileSync } from "node:fs";
import { AzureOpenAI } from "openai/index.mjs";
import {
    DefaultAzureCredential,
    getBearerTokenProvider,
  } from "@azure/identity";

// Set environment variables or edit the corresponding values here.
const endpoint: string = process.env.AZURE_OPENAI_ENDPOINT || "AZURE_OPENAI_ENDPOINT";
const deployment: string = process.env.AZURE_OPENAI_DEPLOYMENT_NAME || "gpt-4o-mini-audio-preview"; 
const apiVersion: string = process.env.OPENAI_API_VERSION || "2025-01-01-preview"; 

// Keyless authentication 
const getClient = (): AzureOpenAI => {
    const credential = new DefaultAzureCredential();
    const scope = "https://cognitiveservices.azure.com/.default";
    const azureADTokenProvider = getBearerTokenProvider(credential, scope);
    const client = new AzureOpenAI({
      endpoint: endpoint,
      apiVersion: apiVersion,
      azureADTokenProvider,
    });
    return client;
};

const client = getClient();

async function main(): Promise<void> {

    // Make the audio chat completions request
    const response = await client.chat.completions.create({ 
        model: "gpt-4o-mini-audio-preview", 
        modalities: ["text", "audio"], 
        audio: { voice: "alloy", format: "wav" }, 
        messages: [ 
        { 
            role: "user", 
            content: "Is a golden retriever a good family dog?" 
        } 
        ] 
    }); 

  // Inspect returned data 
  console.log(response.choices[0]); 

  // Write the output audio data to a file
  if (response.choices[0].message.audio) {
    writeFileSync( 
      "dog.wav", 
      Buffer.from(response.choices[0].message.audio.data, 'base64'), 
      { encoding: "utf-8" } 
    ); 
  } else {
    console.error("Audio data is null or undefined.");
  }
}

main().catch((err: Error) => {
  console.error("Error occurred:", err);
});

export { main };

tsconfig.json dosyasını oluşturup TypeScript kodunu dönüştürmek için ECMAScript için aşağıdaki kodu kopyalayın.

{
    "compilerOptions": {
      "module": "NodeNext",
      "target": "ES2022", // Supports top-level await
      "moduleResolution": "NodeNext",
      "skipLibCheck": true, // Avoid type errors from node_modules
      "strict": true // Enable strict type-checking options
    },
    "include": ["*.ts"]
}

TypeScript'ten JavaScript'e dönüştürme.
```
tsc
```
Aşağıdaki komutla Azure'da oturum açın:
```
az login
```
Kodu aşağıdaki komutla çalıştırın:
```
node to-audio.js
```

to-audio.ts Dosyayı aşağıdaki kodla oluşturun:

import { writeFileSync } from "node:fs";
import { AzureOpenAI } from "openai/index.mjs";

// Set environment variables or edit the corresponding values here.
const endpoint: string = process.env.AZURE_OPENAI_ENDPOINT || "AZURE_OPENAI_ENDPOINT";
const apiKey: string = process.env.AZURE_OPENAI_API_KEY || "AZURE_OPENAI_API_KEY";
const apiVersion: string = "2025-01-01-preview"; 
const deployment: string = "gpt-4o-mini-audio-preview"; 

const client = new AzureOpenAI({ 
  endpoint, 
  apiKey, 
  apiVersion, 
  deployment 
});  

async function main(): Promise<void> {

    // Make the audio chat completions request
    const response = await client.chat.completions.create({ 
        model: "gpt-4o-mini-audio-preview", 
        modalities: ["text", "audio"], 
        audio: { voice: "alloy", format: "wav" }, 
        messages: [ 
        { 
            role: "user", 
            content: "Is a golden retriever a good family dog?" 
        } 
        ] 
    }); 

  // Inspect returned data 
  console.log(response.choices[0]); 

  // Write the output audio data to a file
  if (response.choices[0].message.audio) {
    writeFileSync( 
      "dog.wav", 
      Buffer.from(response.choices[0].message.audio.data, 'base64'), 
      { encoding: "utf-8" } 
    ); 
  } else {
    console.error("Audio data is null or undefined.");
  }
}

main().catch((err: Error) => {
  console.error("Error occurred:", err);
});

export { main };

tsconfig.json dosyasını oluşturup TypeScript kodunu dönüştürmek için ECMAScript için aşağıdaki kodu kopyalayın.

{
    "compilerOptions": {
      "module": "NodeNext",
      "target": "ES2022", // Supports top-level await
      "moduleResolution": "NodeNext",
      "skipLibCheck": true, // Avoid type errors from node_modules
      "strict": true // Enable strict type-checking options
    },
    "include": ["*.ts"]
}

TypeScript'ten JavaScript'e dönüştürme.
```
tsc
```
Kodu aşağıdaki komutla çalıştırın:
```
node to-audio.js
```

Yanıtı almak için birkaç dakika bekleyin.

Metin girişinden ses oluşturma çıktısı

Betik, betikle aynı dizinde dog.wav adlı bir ses dosyası oluşturur. Ses dosyası, "Altın renkli bir retriever iyi bir aile köpeği midir?" istemine verilen sesli yanıtı içerir.

Ses girişinden ses ve metin oluşturma

Microsoft Entra ID
API anahtarı

from-audio.ts Dosyayı aşağıdaki kodla oluşturun:

import { AzureOpenAI } from "openai";
import { writeFileSync } from "node:fs";
import { promises as fs } from 'fs';
import {
    DefaultAzureCredential,
    getBearerTokenProvider,
  } from "@azure/identity";

// Set environment variables or edit the corresponding values here.
const endpoint: string = process.env.AZURE_OPENAI_ENDPOINT || "AZURE_OPENAI_ENDPOINT";
const apiVersion: string = "2025-01-01-preview"; 
const deployment: string = "gpt-4o-mini-audio-preview"; 

// Keyless authentication 
const getClient = (): AzureOpenAI => {
    const credential = new DefaultAzureCredential();
    const scope = "https://cognitiveservices.azure.com/.default";
    const azureADTokenProvider = getBearerTokenProvider(credential, scope);
    const client = new AzureOpenAI({
      endpoint: endpoint,
      apiVersion: apiVersion,
      azureADTokenProvider,
    });
    return client;
};

const client = getClient();

async function main(): Promise<void> {

    // Buffer the audio for input to the chat completion
    const wavBuffer = await fs.readFile("dog.wav"); 
    const base64str = Buffer.from(wavBuffer).toString("base64"); 

    // Make the audio chat completions request
    const response = await client.chat.completions.create({ 
      model: "gpt-4o-mini-audio-preview",
      modalities: ["text", "audio"], 
      audio: { voice: "alloy", format: "wav" },
      messages: [ 
        { 
          role: "user", 
          content: [ 
            { 
              type: "text", 
              text: "Describe in detail the spoken audio input." 
            }, 
            { 
              type: "input_audio", 
              input_audio: { 
                data: base64str, 
                format: "wav" 
              } 
            } 
          ] 
        } 
      ] 
    }); 

    console.log(response.choices[0]); 

    // Write the output audio data to a file
    if (response.choices[0].message.audio) {
        writeFileSync("analysis.wav", Buffer.from(response.choices[0].message.audio.data, 'base64'), { encoding: "utf-8" });
    }
    else {
        console.error("Audio data is null or undefined.");
  }
}

main().catch((err: Error) => {
  console.error("Error occurred:", err);
});

export { main };

tsconfig.json dosyasını oluşturup TypeScript kodunu dönüştürmek için ECMAScript için aşağıdaki kodu kopyalayın.

{
    "compilerOptions": {
      "module": "NodeNext",
      "target": "ES2022", // Supports top-level await
      "moduleResolution": "NodeNext",
      "skipLibCheck": true, // Avoid type errors from node_modules
      "strict": true // Enable strict type-checking options
    },
    "include": ["*.ts"]
}

TypeScript'ten JavaScript'e dönüştürme.
```
tsc
```
Aşağıdaki komutla Azure'da oturum açın:
```
az login
```
Kodu aşağıdaki komutla çalıştırın:
```
node from-audio.js
```

from-audio.ts Dosyayı aşağıdaki kodla oluşturun:

import { AzureOpenAI } from "openai";
import { writeFileSync } from "node:fs";
import { promises as fs } from 'fs';

// Set environment variables or edit the corresponding values here.
const endpoint: string = process.env.AZURE_OPENAI_ENDPOINT || "AZURE_OPENAI_ENDPOINT";
const apiKey: string = process.env.AZURE_OPENAI_API_KEY || "AZURE_OPENAI_API_KEY";
const apiVersion: string = "2025-01-01-preview"; 
const deployment: string = "gpt-4o-mini-audio-preview"; 

const client = new AzureOpenAI({ 
  endpoint, 
  apiKey, 
  apiVersion, 
  deployment 
});  

async function main(): Promise<void> {

  // Buffer the audio for input to the chat completion
  const wavBuffer = await fs.readFile("dog.wav"); 
  const base64str = Buffer.from(wavBuffer).toString("base64"); 

  // Make the audio chat completions request
  const response = await client.chat.completions.create({ 
    model: "gpt-4o-mini-audio-preview",
    modalities: ["text", "audio"], 
    audio: { voice: "alloy", format: "wav" },
    messages: [ 
      { 
        role: "user", 
        content: [ 
          { 
            type: "text", 
            text: "Describe in detail the spoken audio input." 
          }, 
          { 
            type: "input_audio", 
            input_audio: { 
              data: base64str, 
              format: "wav" 
            } 
          } 
        ] 
      } 
    ] 
  }); 

  console.log(response.choices[0]); 

  // Write the output audio data to a file
  if (response.choices[0].message.audio) {
      writeFileSync("analysis.wav", Buffer.from(response.choices[0].message.audio.data, 'base64'), { encoding: "utf-8" });
  }
  else {
      console.error("Audio data is null or undefined.");
}
}

main().catch((err: Error) => {
console.error("Error occurred:", err);
});

export { main };

tsconfig.json dosyasını oluşturup TypeScript kodunu dönüştürmek için ECMAScript için aşağıdaki kodu kopyalayın.

{
    "compilerOptions": {
      "module": "NodeNext",
      "target": "ES2022", // Supports top-level await
      "moduleResolution": "NodeNext",
      "skipLibCheck": true, // Avoid type errors from node_modules
      "strict": true // Enable strict type-checking options
    },
    "include": ["*.ts"]
}

TypeScript'ten JavaScript'e dönüştürme.
```
tsc
```
Kodu aşağıdaki komutla çalıştırın:
```
node from-audio.js
```

Yanıtı almak için birkaç dakika bekleyin.

Ses girişinden ses ve metin oluşturma çıkışı

Betik, konuşulan ses girişinin özetini oluşturur. Ayrıca betikle aynı dizinde analysis.wav adlı bir ses dosyası oluşturur. Ses dosyası, istemin sesli yanıtını içerir.

Ses oluşturma ve çok aşamalı sohbet tamamlamalarını kullanma

Microsoft Entra ID
API anahtarı

multi-turn.ts Dosyayı aşağıdaki kodla oluşturun:

import { AzureOpenAI } from "openai/index.mjs";
import { promises as fs } from 'fs';
import { ChatCompletionMessageParam } from "openai/resources/index.mjs";
import {
    DefaultAzureCredential,
    getBearerTokenProvider,
  } from "@azure/identity";

// Set environment variables or edit the corresponding values here.
const endpoint: string = process.env.AZURE_OPENAI_ENDPOINT || "AZURE_OPENAI_ENDPOINT";
const apiVersion: string = "2025-01-01-preview"; 
const deployment: string = "gpt-4o-mini-audio-preview"; 

// Keyless authentication 
const getClient = (): AzureOpenAI => {
    const credential = new DefaultAzureCredential();
    const scope = "https://cognitiveservices.azure.com/.default";
    const azureADTokenProvider = getBearerTokenProvider(credential, scope);
    const client = new AzureOpenAI({
      endpoint: endpoint,
      apiVersion: apiVersion,
      azureADTokenProvider,
    });
    return client;
};

const client = getClient(); 

async function main(): Promise<void> {

    // Buffer the audio for input to the chat completion
    const wavBuffer = await fs.readFile("dog.wav"); 
    const base64str = Buffer.from(wavBuffer).toString("base64"); 

    // Initialize messages with the first turn's user input 
    const messages: ChatCompletionMessageParam[] = [
      {
        role: "user",
        content: [
          { 
            type: "text", 
            text: "Describe in detail the spoken audio input." 
          },
          { 
            type: "input_audio", 
            input_audio: { 
              data: base64str, 
              format: "wav" 
            } 
          }
        ]
      }
    ];

    // Get the first turn's response 

    const response = await client.chat.completions.create({ 
        model: "gpt-4o-mini-audio-preview",
        modalities: ["text", "audio"], 
        audio: { voice: "alloy", format: "wav" }, 
        messages: messages
    }); 

    console.log(response.choices[0]); 

    // Add a history message referencing the previous turn's audio by ID 
    messages.push({ 
        role: "assistant", 
        audio: response.choices[0].message.audio ? { id: response.choices[0].message.audio.id } : undefined
    });

    // Add a new user message for the second turn
    messages.push({ 
        role: "user", 
        content: [ 
            { 
              type: "text", 
              text: "Very concisely summarize the favorability." 
            } 
        ] 
    }); 

    // Send the follow-up request with the accumulated messages
    const followResponse = await client.chat.completions.create({ 
        model: "gpt-4o-mini-audio-preview",
        messages: messages
    });

    console.log(followResponse.choices[0].message.content); 
}

main().catch((err: Error) => {
  console.error("Error occurred:", err);
});

export { main };

tsconfig.json dosyasını oluşturup TypeScript kodunu dönüştürmek için ECMAScript için aşağıdaki kodu kopyalayın.

{
    "compilerOptions": {
      "module": "NodeNext",
      "target": "ES2022", // Supports top-level await
      "moduleResolution": "NodeNext",
      "skipLibCheck": true, // Avoid type errors from node_modules
      "strict": true // Enable strict type-checking options
    },
    "include": ["*.ts"]
}

TypeScript'ten JavaScript'e dönüştürme.
```
tsc
```
Aşağıdaki komutla Azure'da oturum açın:
```
az login
```
Kodu aşağıdaki komutla çalıştırın:
```
node multi-turn.js
```

multi-turn.ts Dosyayı aşağıdaki kodla oluşturun:

import { AzureOpenAI } from "openai/index.mjs";
import { promises as fs } from 'fs';
import { ChatCompletionMessageParam } from "openai/resources/index.mjs";

// Set environment variables or edit the corresponding values here.
const endpoint: string = process.env.AZURE_OPENAI_ENDPOINT || "AZURE_OPENAI_ENDPOINT" as string;
const apiKey: string = process.env.AZURE_OPENAI_API_KEY || "AZURE_OPENAI_API_KEY";
const apiVersion: string = "2025-01-01-preview"; 
const deployment: string = "gpt-4o-mini-audio-preview"; 

const client = new AzureOpenAI({ 
  endpoint, 
  apiKey, 
  apiVersion, 
  deployment 
});  

async function main(): Promise<void> {

    // Buffer the audio for input to the chat completion
    const wavBuffer = await fs.readFile("dog.wav"); 
    const base64str = Buffer.from(wavBuffer).toString("base64"); 

    // Initialize messages with the first turn's user input 
    const messages: ChatCompletionMessageParam[] = [
      {
        role: "user",
        content: [
          { 
            type: "text", 
            text: "Describe in detail the spoken audio input." 
          },
          { 
            type: "input_audio", 
            input_audio: { 
              data: base64str, 
              format: "wav" 
            } 
          }
        ]
      }
    ];

    // Get the first turn's response 

    const response = await client.chat.completions.create({ 
      model: "gpt-4o-mini-audio-preview",
      modalities: ["text", "audio"], 
      audio: { voice: "alloy", format: "wav" }, 
      messages: messages
    }); 

    console.log(response.choices[0]); 

    // Add a history message referencing the previous turn's audio by ID 
    messages.push({ 
        role: "assistant", 
        audio: response.choices[0].message.audio ? { id: response.choices[0].message.audio.id } : undefined
    });

    // Add a new user message for the second turn
    messages.push({ 
        role: "user", 
        content: [ 
            { 
              type: "text", 
              text: "Very concisely summarize the favorability." 
            } 
        ] 
    }); 

    // Send the follow-up request with the accumulated messages
    const followResponse = await client.chat.completions.create({ 
        model: "gpt-4o-mini-audio-preview",
        messages: messages
    });

    console.log(followResponse.choices[0].message.content); 
}

main().catch((err: Error) => {
  console.error("Error occurred:", err);
});

export { main };

tsconfig.json dosyasını oluşturup TypeScript kodunu dönüştürmek için ECMAScript için aşağıdaki kodu kopyalayın.

{
    "compilerOptions": {
      "module": "NodeNext",
      "target": "ES2022", // Supports top-level await
      "moduleResolution": "NodeNext",
      "skipLibCheck": true, // Avoid type errors from node_modules
      "strict": true // Enable strict type-checking options
    },
    "include": ["*.ts"]
}

TypeScript'ten JavaScript'e dönüştürme.
```
tsc
```
Kodu aşağıdaki komutla çalıştırın:
```
node multi-turn.js
```

Yanıtı almak için birkaç dakika bekleyin.

Çok aşamalı sohbet tamamlamaları için çıkış

Betik, konuşulan ses girişinin özetini oluşturur. Ardından, konuşulan ses girişini kısaca özetlemek için çok aşamalı bir sohbet tamamlama işlemi yapar.

Kaynakları temizleme

Bir Azure OpenAI kaynağını temizlemek ve kaldırmak istiyorsanız kaynağı silebilirsiniz. Kaynağı silmeden önce dağıtılan modelleri silmeniz gerekir.

Sorun giderme

Uyarı

Sesli mod ile sohbet tamamlamalarını kullanırken gpt-4o-audio-preview, ve stream true olarak ayarlandığında, desteklenen tek ses formatı pcm16'dır.

Azure OpenAI dağıtım türleri hakkında daha fazla bilgi edinin.
Azure OpenAI kotaları ve sınırları hakkında daha fazla bilgi edinin.

Aracılığıyla paylaş

Hızlı Başlangıç: Azure OpenAI ses oluşturmayı kullanmaya başlama

Desteklenen modeller

API desteği

Ses oluşturma için model dağıtma

GPT-4o ses oluşturmayı kullanma

Desteklenen modeller

API desteği

Önkoşullar

Microsoft Entra Id önkoşulları

Kurulum

Kaynak bilgilerini alma

Metin girişinden ses oluşturma

Metin girişinden ses oluşturma çıktısı

Ses girişinden ses ve metin oluşturma

Ses girişinden ses ve metin oluşturma çıkışı

Ses oluşturma ve çok aşamalı sohbet tamamlamalarını kullanma

Çok aşamalı sohbet tamamlamaları için çıkış

Desteklenen modeller

API desteği

Önkoşullar

Microsoft Entra Id önkoşulları

Kurulum

Kaynak bilgilerini alma

Metin girişinden ses oluşturma

Metin girişinden ses oluşturma çıktısı

Ses girişinden ses ve metin oluşturma

Ses girişinden ses ve metin oluşturma çıkışı

Ses oluşturma ve çok aşamalı sohbet tamamlamalarını kullanma

Çok aşamalı sohbet tamamlamaları için çıkış

Desteklenen modeller

API desteği

Önkoşullar

Microsoft Entra Id önkoşulları

Kurulum

Kaynak bilgilerini alma

Metin girişinden ses oluşturma

Metin girişinden ses oluşturma çıktısı

Ses girişinden ses ve metin oluşturma

Ses girişinden ses ve metin oluşturma çıkışı

Ses oluşturma ve çok aşamalı sohbet tamamlamalarını kullanma

Çok aşamalı sohbet tamamlamaları için çıkış

Desteklenen modeller

API desteği

Önkoşullar

Microsoft Entra Id önkoşulları

Kurulum

Kaynak bilgilerini alma

Metin girişinden ses oluşturma

Metin girişinden ses oluşturma çıktısı

Ses girişinden ses ve metin oluşturma

Ses girişinden ses ve metin oluşturma çıkışı

Ses oluşturma ve çok aşamalı sohbet tamamlamalarını kullanma

Çok aşamalı sohbet tamamlamaları için çıkış

Kaynakları temizleme

Sorun giderme

İlgili içerik

Geri Bildirim

Ek kaynaklar