如何使用音訊輸入資料流

2025-03-10

語音 SDK 提供將音訊串流至辨識器作為麥克風或檔案輸入的替代方法。

本指南說明如何使用音訊輸入資料流。同時描述音訊輸入資料流的一些需求與限制。

請參閱 GitHub 上更多的語音轉換文字辨識搭配音訊輸入資料流範例。

識別音訊資料流的格式

識別音訊資料流的格式。

支援的音訊範例如下：

PCM 格式 (int-16、帶正負號的)
一個通道
每個樣本 16 位元、每秒 8,000 個或 16,000 個樣本 (每秒 16,000 位元組或 32,000 位元組)
雙區塊對齊 (16 位元，包括樣本的填補)

SDK 中用於建立音訊格式相對應程式碼看起來像此範例：

byte channels = 1;
byte bitsPerSample = 16;
int samplesPerSecond = 16000; // or 8000
var audioFormat = AudioStreamFormat.GetWaveFormatPCM(samplesPerSecond, bitsPerSample, channels);

請確定您的程式碼根據這些規格提供未經處理的 RAW 音訊資料。此外，請確定 16 位元樣本抵達時為 little-endian 格式。如果您的音訊來源資料不符合支援的格式，則必須將音訊轉碼成所需的格式。

建立自己的音訊輸入資料流類別

建立自己衍生自 PullAudioInputStreamCallback 的音訊輸入資料流類別。實作 Read() 和 Close() 成員。確切的函式簽章視語言而定，但程式碼看起來類似此程式碼範例：

public class ContosoAudioStream : PullAudioInputStreamCallback 
{
    public ContosoAudioStream() {}

    public override int Read(byte[] buffer, uint size) 
    {
        // Returns audio data to the caller.
        // E.g., return read(config.YYY, buffer, size);
        return 0;
    }

    public override void Close() 
    {
        // Close and clean up resources.
    }
}

根據您的音訊格式與自訂音效輸入資料流，建立音訊設定。例如：

var audioConfig = AudioConfig.FromStreamInput(new ContosoAudioStream(), audioFormat);

以下說明如何在語音辨識器內容中使用自訂音訊輸入資料流：

using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

public class ContosoAudioStream : PullAudioInputStreamCallback 
{
    public ContosoAudioStream() {}

    public override int Read(byte[] buffer, uint size) 
    {
        // Returns audio data to the caller.
        // E.g., return read(config.YYY, buffer, size);
        return 0;
    }

    public override void Close() 
    {
        // Close and clean up resources.
    }
}

class Program 
{
    static string speechKey = Environment.GetEnvironmentVariable("SPEECH_KEY");
    static string speechRegion = Environment.GetEnvironmentVariable("SPEECH_REGION");

    async static Task Main(string[] args)
    {
        byte channels = 1;
        byte bitsPerSample = 16;
        uint samplesPerSecond = 16000; // or 8000
        var audioFormat = AudioStreamFormat.GetWaveFormatPCM(samplesPerSecond, bitsPerSample, channels);
        var audioConfig = AudioConfig.FromStreamInput(new ContosoAudioStream(), audioFormat);

        var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion); 
        speechConfig.SpeechRecognitionLanguage = "en-US";
        var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);

        Console.WriteLine("Speak into your microphone.");
        var speechRecognitionResult = await speechRecognizer.RecognizeOnceAsync();
        Console.WriteLine($"RECOGNIZED: Text={speechRecognitionResult.Text}");
    }
}

共用方式為

如何使用音訊輸入資料流

識別音訊資料流的格式

建立自己的音訊輸入資料流類別

下一步

意見反應

其他資源