教程：解码音频

2023-06-13

本教程演示如何使用源读取器解码媒体文件中的音频，并将音频写入 WAVE 文件。本教程基于音频剪辑示例。

概述
头文件和库文件
实现 wmain
写入 WAVE 文件
配置源读取器
写入 WAVE 文件头
计算最大数据大小
解码音频
完成文件头
相关主题

概述

在本教程中，你将创建一个控制台应用程序，该应用程序采用两个命令行参数：包含音频流的输入文件的名称和输出文件名。应用程序从输入文件读取 5 秒的音频数据，并将音频作为 WAVE 数据写入输出文件。

为了获取解码的音频数据，应用程序使用源读取器对象。源读取器公开 IMFSourceReader 接口。若要将解码的音频写入 WAVE 文件，应用程序使用 Windows I/O 函数。下图演示了此过程。

显示源读取器从源文件获取音频数据的示意图。

在最简单的形式中，WAVE 文件具有以下结构：

数据类型	大小(字节)	值
FOURCC	4	“RIFF”
DWORD	4	总文件大小，不包括前 8 个字节
FOURCC	4	“WAVE”
FOURCC	4	'fmt '
DWORD	4	后面的 WAVEFORMATEX 数据的大小。
WAVEFORMATEX	多种多样	音频格式标头。
FOURCC	4	“data”
DWORD	4	音频数据的大小。
BYTE[]	多种多样	音频数据。

注意

FOURCC 是通过串联四个 ASCII 字符构成的 DWORD。

可以通过添加文件元数据和其他信息来扩展此基本结构，这超出了本教程的范围。

头文件和库文件

在项目中包含以下头文件：

#define WINVER _WIN32_WINNT_WIN7

#include <windows.h>
#include <mfapi.h>
#include <mfidl.h>
#include <mfreadwrite.h>
#include <stdio.h>
#include <mferror.h>

链接到以下库：

mfplat.lib
mfreadwrite.lib
mfuuid.lib

实现 wmain

以下代码显示了应用程序的入口点函数。

int wmain(int argc, wchar_t* argv[])
{
    HeapSetInformation(NULL, HeapEnableTerminationOnCorruption, NULL, 0);

    if (argc != 3)
    {
        printf("arguments: input_file output_file.wav\n");
        return 1;
    }

    const WCHAR *wszSourceFile = argv[1];
    const WCHAR *wszTargetFile = argv[2];

    const LONG MAX_AUDIO_DURATION_MSEC = 5000; // 5 seconds

    HRESULT hr = S_OK;

    IMFSourceReader *pReader = NULL;
    HANDLE hFile = INVALID_HANDLE_VALUE;

    // Initialize the COM library.
    hr = CoInitializeEx(NULL, COINIT_APARTMENTTHREADED | COINIT_DISABLE_OLE1DDE);

    // Initialize the Media Foundation platform.
    if (SUCCEEDED(hr))
    {
        hr = MFStartup(MF_VERSION);
    }

    // Create the source reader to read the input file.
    if (SUCCEEDED(hr))
    {
        hr = MFCreateSourceReaderFromURL(wszSourceFile, NULL, &pReader);
        if (FAILED(hr))
        {
            printf("Error opening input file: %S\n", wszSourceFile, hr);
        }
    }

    // Open the output file for writing.
    if (SUCCEEDED(hr))
    {
        hFile = CreateFile(wszTargetFile, GENERIC_WRITE, FILE_SHARE_READ, NULL,
            CREATE_ALWAYS, 0, NULL);

        if (hFile == INVALID_HANDLE_VALUE)
        {
            hr = HRESULT_FROM_WIN32(GetLastError());
            printf("Cannot create output file: %S\n", wszTargetFile, hr);
        }
    }

    // Write the WAVE file.
    if (SUCCEEDED(hr))
    {
        hr = WriteWaveFile(pReader, hFile, MAX_AUDIO_DURATION_MSEC);
    }

    if (FAILED(hr))
    {
        printf("Failed, hr = 0x%X\n", hr);
    }

    // Clean up.
    if (hFile != INVALID_HANDLE_VALUE)
    {
        CloseHandle(hFile);
    }

    SafeRelease(&pReader);
    MFShutdown();
    CoUninitialize();

    return SUCCEEDED(hr) ? 0 : 1;
};

此函数执行以下操作：

调用 CoInitializeEx 以初始化 COM 库。
调用 MFStartup 以初始化 Media Foundation 平台。
调用 MFCreateSourceReaderFromURL 来创建源读取器。此函数采用输入文件的名称并接收 IMFSourceReader 接口指针。
通过调用 CreateFile 函数创建输出文件，该函数返回文件句柄。
调用应用程序定义的 WriteWavFile 函数。此函数解码音频并写入 WAVE 文件。
释放 IMFSourceReader 指针和文件句柄。
调用 MFShutdown 以关闭 Media Foundation 平台。
调用 CoUninitialize 以释放 COM 库。

写入 WAVE 文件

大部分工作发生在函数中 WriteWavFile ，该函数从 wmain调用。

//-------------------------------------------------------------------
// WriteWaveFile
//
// Writes a WAVE file by getting audio data from the source reader.
//
//-------------------------------------------------------------------

HRESULT WriteWaveFile(
    IMFSourceReader *pReader,   // Pointer to the source reader.
    HANDLE hFile,               // Handle to the output file.
    LONG msecAudioData          // Maximum amount of audio data to write, in msec.
    )
{
    HRESULT hr = S_OK;

    DWORD cbHeader = 0;         // Size of the WAVE file header, in bytes.
    DWORD cbAudioData = 0;      // Total bytes of PCM audio data written to the file.
    DWORD cbMaxAudioData = 0;

    IMFMediaType *pAudioType = NULL;    // Represents the PCM audio format.

    // Configure the source reader to get uncompressed PCM audio from the source file.

    hr = ConfigureAudioStream(pReader, &pAudioType);

    // Write the WAVE file header.
    if (SUCCEEDED(hr))
    {
        hr = WriteWaveHeader(hFile, pAudioType, &cbHeader);
    }

    // Calculate the maximum amount of audio to decode, in bytes.
    if (SUCCEEDED(hr))
    {
        cbMaxAudioData = CalculateMaxAudioDataSize(pAudioType, cbHeader, msecAudioData);

        // Decode audio data to the file.
        hr = WriteWaveData(hFile, pReader, cbMaxAudioData, &cbAudioData);
    }

    // Fix up the RIFF headers with the correct sizes.
    if (SUCCEEDED(hr))
    {
        hr = FixUpChunkSizes(hFile, cbHeader, cbAudioData);
    }

    SafeRelease(&pAudioType);
    return hr;
}

此函数调用一系列其他应用程序定义的函数：

ConfigureAudioStream 函数初始化源读取器。此函数接收指向 IMFMediaType 接口的指针，该接口用于获取解码音频格式的说明，包括采样率、通道数以及每个样本) 的位深度 (位。
WriteWaveHeader 函数写入 WAVE 文件的第一部分，包括标头和“数据”区块的开头。
CalculateMaxAudioDataSize 函数计算要写入文件的最大音频量（以字节为单位）。
WriteWaveData 函数将 PCM 音频数据写入文件。
FixUpChunkSizes 函数写入 WAVE 文件中“RIFF”和“data” FOURCC 值之后显示的文件大小信息。 (在 WriteWaveData completes.)

本教程的其余部分介绍了这些函数。

配置源读取器

函数 ConfigureAudioStream 将源读取器配置为解码源文件中的音频流。它还返回有关解码音频格式的信息。

在 Media Foundation 中，媒体格式是使用 媒体类型 对象描述的。媒体类型对象公开 IMFMediaType 接口，该接口继承 了 IMFAttributes 接口。从本质上讲，媒体类型是描述格式的属性集合。有关详细信息，请参阅媒体类型。

//-------------------------------------------------------------------
// ConfigureAudioStream
//
// Selects an audio stream from the source file, and configures the
// stream to deliver decoded PCM audio.
//-------------------------------------------------------------------

HRESULT ConfigureAudioStream(
    IMFSourceReader *pReader,   // Pointer to the source reader.
    IMFMediaType **ppPCMAudio   // Receives the audio format.
    )
{
    IMFMediaType *pUncompressedAudioType = NULL;
    IMFMediaType *pPartialType = NULL;

    // Select the first audio stream, and deselect all other streams.
    HRESULT hr = pReader->SetStreamSelection(
        (DWORD)MF_SOURCE_READER_ALL_STREAMS, FALSE);

    if (SUCCEEDED(hr))
    {
        hr = pReader->SetStreamSelection(
            (DWORD)MF_SOURCE_READER_FIRST_AUDIO_STREAM, TRUE);
    }

    // Create a partial media type that specifies uncompressed PCM audio.
    hr = MFCreateMediaType(&pPartialType);

    if (SUCCEEDED(hr))
    {
        hr = pPartialType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio);
    }

    if (SUCCEEDED(hr))
    {
        hr = pPartialType->SetGUID(MF_MT_SUBTYPE, MFAudioFormat_PCM);
    }

    // Set this type on the source reader. The source reader will
    // load the necessary decoder.
    if (SUCCEEDED(hr))
    {
        hr = pReader->SetCurrentMediaType(
            (DWORD)MF_SOURCE_READER_FIRST_AUDIO_STREAM,
            NULL, pPartialType);
    }

    // Get the complete uncompressed format.
    if (SUCCEEDED(hr))
    {
        hr = pReader->GetCurrentMediaType(
            (DWORD)MF_SOURCE_READER_FIRST_AUDIO_STREAM,
            &pUncompressedAudioType);
    }

    // Ensure the stream is selected.
    if (SUCCEEDED(hr))
    {
        hr = pReader->SetStreamSelection(
            (DWORD)MF_SOURCE_READER_FIRST_AUDIO_STREAM,
            TRUE);
    }

    // Return the PCM format to the caller.
    if (SUCCEEDED(hr))
    {
        *ppPCMAudio = pUncompressedAudioType;
        (*ppPCMAudio)->AddRef();
    }

    SafeRelease(&pUncompressedAudioType);
    SafeRelease(&pPartialType);
    return hr;
}

函数 ConfigureAudioStream 执行以下操作：

调用 IMFSourceReader：：SetStreamSelection 方法以选择音频流并取消选择所有其他流。此步骤可以提高性能，因为它可以防止源阅读器持有应用程序不使用的视频帧。
创建指定 PCM 音频 的部分 媒体类型。函数创建分部类型，如下所示：
1. 调用 MFCreateMediaType 以创建空的媒体类型对象。
2. 将 MF_MT_MAJOR_TYPE 属性设置为 MFMediaType_Audio。
3. 将 MF_MT_SUBTYPE 属性设置为 MFAudioFormat_PCM。
调用 IMFSourceReader：：SetCurrentMediaType 在源读取器上设置分部类型。如果源文件包含编码的音频，源读取器会自动加载必要的音频解码器。
调用 IMFSourceReader：：GetCurrentMediaType 以获取实际的 PCM 媒体类型。此方法返回填充了所有格式详细信息的媒体类型，例如音频采样率和声道数。
调用 IMFSourceReader：：SetStreamSelection 以启用音频流。

写入 WAVE 文件头

函数 WriteWaveHeader 写入 WAVE 文件标头。

从此函数调用的唯一媒体基础 API 是 MFCreateWaveFormatExFromMFMediaType，它将媒体类型转换为 WAVEFORMATEX 结构。

//-------------------------------------------------------------------
// WriteWaveHeader
//
// Write the WAVE file header.
//
// Note: This function writes placeholder values for the file size
// and data size, as these values will need to be filled in later.
//-------------------------------------------------------------------

HRESULT WriteWaveHeader(
    HANDLE hFile,               // Output file.
    IMFMediaType *pMediaType,   // PCM audio format.
    DWORD *pcbWritten           // Receives the size of the header.
    )
{
    HRESULT hr = S_OK;
    UINT32 cbFormat = 0;

    WAVEFORMATEX *pWav = NULL;

    *pcbWritten = 0;

    // Convert the PCM audio format into a WAVEFORMATEX structure.
    hr = MFCreateWaveFormatExFromMFMediaType(pMediaType, &pWav, &cbFormat);

    // Write the 'RIFF' header and the start of the 'fmt ' chunk.
    if (SUCCEEDED(hr))
    {
        DWORD header[] = {
            // RIFF header
            FCC('RIFF'),
            0,
            FCC('WAVE'),
            // Start of 'fmt ' chunk
            FCC('fmt '),
            cbFormat
        };

        DWORD dataHeader[] = { FCC('data'), 0 };

        hr = WriteToFile(hFile, header, sizeof(header));

        // Write the WAVEFORMATEX structure.
        if (SUCCEEDED(hr))
        {
            hr = WriteToFile(hFile, pWav, cbFormat);
        }

        // Write the start of the 'data' chunk

        if (SUCCEEDED(hr))
        {
            hr = WriteToFile(hFile, dataHeader, sizeof(dataHeader));
        }

        if (SUCCEEDED(hr))
        {
            *pcbWritten = sizeof(header) + cbFormat + sizeof(dataHeader);
        }
    }


    CoTaskMemFree(pWav);
    return hr;
}

函数 WriteToFile 是一个简单的帮助程序函数，用于包装 Windows WriteFile 函数并返回 HRESULT 值。

//-------------------------------------------------------------------
//
// Writes a block of data to a file
//
// hFile: Handle to the file.
// p: Pointer to the buffer to write.
// cb: Size of the buffer, in bytes.
//
//-------------------------------------------------------------------

HRESULT WriteToFile(HANDLE hFile, void* p, DWORD cb)
{
    DWORD cbWritten = 0;
    HRESULT hr = S_OK;

    BOOL bResult = WriteFile(hFile, p, cb, &cbWritten, NULL);
    if (!bResult)
    {
        hr = HRESULT_FROM_WIN32(GetLastError());
    }
    return hr;
}

计算最大数据大小

由于文件大小以 4 字节值的形式存储在文件标头中，因此 WAVE 文件的最大大小限制为0xFFFFFFFF个字节，大约为 4 GB。此值包括文件头的大小。 PCM 音频具有恒定的比特率，因此可以从音频格式中计算最大数据大小，如下所示：

//-------------------------------------------------------------------
// CalculateMaxAudioDataSize
//
// Calculates how much audio to write to the WAVE file, given the
// audio format and the maximum duration of the WAVE file.
//-------------------------------------------------------------------

DWORD CalculateMaxAudioDataSize(
    IMFMediaType *pAudioType,    // The PCM audio format.
    DWORD cbHeader,              // The size of the WAVE file header.
    DWORD msecAudioData          // Maximum duration, in milliseconds.
    )
{
    UINT32 cbBlockSize = 0;         // Audio frame size, in bytes.
    UINT32 cbBytesPerSecond = 0;    // Bytes per second.

    // Get the audio block size and number of bytes/second from the audio format.

    cbBlockSize = MFGetAttributeUINT32(pAudioType, MF_MT_AUDIO_BLOCK_ALIGNMENT, 0);
    cbBytesPerSecond = MFGetAttributeUINT32(pAudioType, MF_MT_AUDIO_AVG_BYTES_PER_SECOND, 0);

    // Calculate the maximum amount of audio data to write.
    // This value equals (duration in seconds x bytes/second), but cannot
    // exceed the maximum size of the data chunk in the WAVE file.

        // Size of the desired audio clip in bytes:
    DWORD cbAudioClipSize = (DWORD)MulDiv(cbBytesPerSecond, msecAudioData, 1000);

    // Largest possible size of the data chunk:
    DWORD cbMaxSize = MAXDWORD - cbHeader;

    // Maximum size altogether.
    cbAudioClipSize = min(cbAudioClipSize, cbMaxSize);

    // Round to the audio block size, so that we do not write a partial audio frame.
    cbAudioClipSize = (cbAudioClipSize / cbBlockSize) * cbBlockSize;

    return cbAudioClipSize;
}

为了避免部分音频帧，大小四舍五入为块对齐，该对齐方式存储在 MF_MT_AUDIO_BLOCK_ALIGNMENT 属性中。

解码音频

函数 WriteWaveData 从源文件读取解码的音频，并写入 WAVE 文件。

//-------------------------------------------------------------------
// WriteWaveData
//
// Decodes PCM audio data from the source file and writes it to
// the WAVE file.
//-------------------------------------------------------------------

HRESULT WriteWaveData(
    HANDLE hFile,               // Output file.
    IMFSourceReader *pReader,   // Source reader.
    DWORD cbMaxAudioData,       // Maximum amount of audio data (bytes).
    DWORD *pcbDataWritten       // Receives the amount of data written.
    )
{
    HRESULT hr = S_OK;
    DWORD cbAudioData = 0;
    DWORD cbBuffer = 0;
    BYTE *pAudioData = NULL;

    IMFSample *pSample = NULL;
    IMFMediaBuffer *pBuffer = NULL;

    // Get audio samples from the source reader.
    while (true)
    {
        DWORD dwFlags = 0;

        // Read the next sample.
        hr = pReader->ReadSample(
            (DWORD)MF_SOURCE_READER_FIRST_AUDIO_STREAM,
            0, NULL, &dwFlags, NULL, &pSample );

        if (FAILED(hr)) { break; }

        if (dwFlags & MF_SOURCE_READERF_CURRENTMEDIATYPECHANGED)
        {
            printf("Type change - not supported by WAVE file format.\n");
            break;
        }
        if (dwFlags & MF_SOURCE_READERF_ENDOFSTREAM)
        {
            printf("End of input file.\n");
            break;
        }

        if (pSample == NULL)
        {
            printf("No sample\n");
            continue;
        }

        // Get a pointer to the audio data in the sample.

        hr = pSample->ConvertToContiguousBuffer(&pBuffer);

        if (FAILED(hr)) { break; }


        hr = pBuffer->Lock(&pAudioData, NULL, &cbBuffer);

        if (FAILED(hr)) { break; }


        // Make sure not to exceed the specified maximum size.
        if (cbMaxAudioData - cbAudioData < cbBuffer)
        {
            cbBuffer = cbMaxAudioData - cbAudioData;
        }

        // Write this data to the output file.
        hr = WriteToFile(hFile, pAudioData, cbBuffer);

        if (FAILED(hr)) { break; }

        // Unlock the buffer.
        hr = pBuffer->Unlock();
        pAudioData = NULL;

        if (FAILED(hr)) { break; }

        // Update running total of audio data.
        cbAudioData += cbBuffer;

        if (cbAudioData >= cbMaxAudioData)
        {
            break;
        }

        SafeRelease(&pSample);
        SafeRelease(&pBuffer);
    }

    if (SUCCEEDED(hr))
    {
        printf("Wrote %d bytes of audio data.\n", cbAudioData);

        *pcbDataWritten = cbAudioData;
    }

    if (pAudioData)
    {
        pBuffer->Unlock();
    }

    SafeRelease(&pBuffer);
    SafeRelease(&pSample);
    return hr;
}

函数 WriteWaveData 在循环中执行以下操作：

调用 IMFSourceReader：：ReadSample 以从源文件读取音频。 dwFlags 参数从 MF_SOURCE_READER_FLAG 枚举接收标志的按位 OR。 pSample 参数接收指向 IMFSample 接口的指针，该接口用于访问音频数据。在某些情况下，调用 ReadSample 不会生成数据，在这种情况下， IMFSample 指针为 NULL。
检查 dwFlags 是否存在以下标志：
- MF_SOURCE_READERF_CURRENTMEDIATYPECHANGED。此标志指示源文件中的格式更改。 WAVE 文件不支持格式更改。
- MF_SOURCE_READERF_ENDOFSTREAM。此标志指示流的结束。
调用 IMFSample：：ConvertToContiguousBuffer 以获取指向缓冲区对象的指针。
调用 IMFMediaBuffer：：Lock 以获取指向缓冲区内存的指针。
将音频数据写入输出文件。
调用 IMFMediaBuffer：：Unlock 以解锁缓冲区对象。

发生以下任一情况时，函数会中断循环：

流格式更改。
已到达流的末尾。
将最大音频数据量写入输出文件。
发生错误。

完成文件头

在上一个函数完成之前，存储在 WAVE 标头中的大小值是未知的。填充 FixUpChunkSizes 以下值：

//-------------------------------------------------------------------
// FixUpChunkSizes
//
// Writes the file-size information into the WAVE file header.
//
// WAVE files use the RIFF file format. Each RIFF chunk has a data
// size, and the RIFF header has a total file size.
//-------------------------------------------------------------------

HRESULT FixUpChunkSizes(
    HANDLE hFile,           // Output file.
    DWORD cbHeader,         // Size of the 'fmt ' chuck.
    DWORD cbAudioData       // Size of the 'data' chunk.
    )
{
    HRESULT hr = S_OK;

    LARGE_INTEGER ll;
    ll.QuadPart = cbHeader - sizeof(DWORD);

    if (0 == SetFilePointerEx(hFile, ll, NULL, FILE_BEGIN))
    {
        hr = HRESULT_FROM_WIN32(GetLastError());
    }

    // Write the data size.

    if (SUCCEEDED(hr))
    {
        hr = WriteToFile(hFile, &cbAudioData, sizeof(cbAudioData));
    }

    if (SUCCEEDED(hr))
    {
        // Write the file size.
        ll.QuadPart = sizeof(FOURCC);

        if (0 == SetFilePointerEx(hFile, ll, NULL, FILE_BEGIN))
        {
            hr = HRESULT_FROM_WIN32(GetLastError());
        }
    }

    if (SUCCEEDED(hr))
    {
        DWORD cbRiffFileSize = cbHeader + cbAudioData - 8;

        // NOTE: The "size" field in the RIFF header does not include
        // the first 8 bytes of the file. (That is, the size of the
        // data that appears after the size field.)

        hr = WriteToFile(hFile, &cbRiffFileSize, sizeof(cbRiffFileSize));
    }

    return hr;
}

音频媒体类型
源读取者
IMFSourceReader

通过