AAC Decoder

Makale
08/25/2021

The Microsoft Media Foundation AAC decoder is a Media Foundation Transform that decodes the following Advanced Audio Coding (AAC) and High Efficiency AAC (HE-AAC) profiles:

MPEG-2 AAC Low Complexity (LC) profile (multichannel).
MPEG-4 HE-AAC v1 (multichannel) with AAC-LC core.
MPEG-4 HE-AAC v2 (stereo) with AAC-LC core.

The AAC decoder supports both raw AAC streams with no headers and AAC in an audio data transport stream (ADTS).

Starting in Windows 8, the AAC decoder also supports decoding MPEG-4 audio transport streams with a multiplex layer (LATM) and synchronization layer (LOAS). It can also convert an LATM/LOAS stream to ADTS.

Class Identifier

The class identifier (CLSID) of the AAC encoder is CLSID_CMSAACDecMFT, defined in the header file wmcodecdsp.h.

Media Types

The AAC decoder supports the following media types.

Input Types

The AAC decoder supports the following audio subtypes:

Subtype	Description	Header
MFAudioFormat_AAC	Raw AAC or ADTS AAC. For this subtype, the media type gives the sample rate and number of channels prior to the application of spectral band replication (SBR) and parametric stereo (PS) tools, if present. The effect of the SBR tool is to double the decoded sample rate relative to the core AAC-LC sample rate. The effect of the PS tool is to decode stereo from a mono-channel core AAC-LC stream. This subtype is equivalent to MEDIASUBTYPE_MPEG_HEAAC, defined in wmcodecdsp.h. See Audio Subtype GUIDs. The MPEG-4 File Source and the ADTS Parser output this subtype.	mfapi.h
MEDIASUBTYPE_RAW_AAC1	Raw AAC. This subtype is used for AAC contained in an AVI file with the audio format tag equal to WAVE_FORMAT_RAW_AAC1 (0x00FF). For this subtype, the media type gives the sample rate and number of channels after the SBR and PS tools are applied, if present.	wmcodecdsp.h

To configure the AAC decoder, set the following attributes on the input media type.

Attribute	Description	Remarks
MF_MT_MAJOR_TYPE	Major type.	Must be MFMediaType_Audio.
MF_MT_SUBTYPE	Audio subtype.	Refer to the previous description for details.
MF_MT_AAC_AUDIO_PROFILE_LEVEL_INDICATION	Audio profile and level.	Optional. Applies only to MFAudioFormat_AAC. The value of this attribute is the audioProfileLevelIndication field, as defined by ISO/IEC 14496-3. If unknown, set to zero or 0xFE ("no audio profile specified").
MF_MT_AAC_PAYLOAD_TYPE	Payload type.	Applies only to MFAudioFormat_AAC. The decoder supports the following payload types: 0: Raw AAC. The stream contains raw_data_block() elements only, as defined by MPEG-2. 1: ADTS. The stream contains an adts_sequence(), as defined by MPEG-2. Only one raw_data_block() per adts_frame() is allowed. 3: Audio transport stream with a synchronization layer (LOAS) and a multiplex layer (LATM). Of the three types of LOAS, only AudioSyncStream is supported. The multiplex layer is AudioMuxElement, restricted to one audio program and one layer. MF_MT_AAC_PAYLOAD_TYPE is optional. If this attribute is not specified, the default value 0 is used, which specifies the stream contains raw_data_block elements only.
MF_MT_AUDIO_BITS_PER_SAMPLE	Desired bit depth of the decoded PCM audio.
MF_MT_AUDIO_CHANNEL_MASK	Specifies the assignment of audio channels to speaker positions.	Optional. For more information, see Format Constraints.
MF_MT_AUDIO_NUM_CHANNELS	Number of channels, including the low frequency (LFE) channel, if present.	The interpretation of this value depends on the media subtype, as described previously.
MF_MT_AUDIO_SAMPLES_PER_SECOND	Sample rate, in samples per second.	The interpretation of this value depends on the media subtype, as described previously.
MF_MT_USER_DATA	Additional format information.	The value of this attribute depends on the subtype. MFAudioFormat_AAC: Contains the portion of the HEAACWAVEINFO structure that appears after the WAVEFORMATEX structure (that is, after the wfx member). This is followed by the AudioSpecificConfig() data, as defined by ISO/IEC 14496-3. MEDIASUBTYPE_RAW_AAC1: Contains the AudioSpecificConfig() data. This data must appear; otherwise, the decoder will reject the media type. The length of the AudioSpecificConfig() data is 2 bytes for AAC-LC or HE-AAC with implicit signaling of SBR/PS. It is more than 2 bytes for HE-AAC with explicit signaling of SBR/PS. The value of audioObjectType as defined in AudioSpecificConfig() must be 2, indicating AAC-LC. The value of extensionAudioObjectType must be 5 for SBR or 29 for PS.

Output Types

The decoder supports the following output types:

Subtype	Description
MFAudioFormat_Float	IEEE floating-point audio.
MFAudioFormat_PCM	16-bit PCM audio.
MFAudioFormat_AAC	Requires Windows 8. This output type can be used to convert an AAC stream in the LOAS/LATM format to ADTS format. To convert an LOAS/LATM stream to an ADTS stream, set the input type to MFAudioFormat_AAC with payload type 3 (LOAS). Then set the output type to MFAudioFormat_AAC with payload type 1 (ADTS). The decoder will reformat the conainter without decoding the bitstream. Note: The decoder does not register MFAudioFormat_AAC as an output type. However, if the application sets the input type as described, the IMFTransform::GetOutputAvailableType method returns MFAudioFormat_AAC in the list of available output types.

If the input stream contains more than two channels, the AAC decoder provides two options for the output format:

The same channel configuration as the input type.
Stereo fold-down.

Format Constraints

The decoded audio sampling rate must be one of the following, after SBR is applied (if present):

8 kHz
11.025 kHz
12 kHz
16 kHz
22.05 kHz
24 kHz
32 kHz
44.1 kHz
48 kHz

Sampling rates above 48 kHz are not supported.

The decoder supports up to 6 audio channels. For each speaker configuration, the decoder expects the AAC syntactic elements to appear in a certain order. The following table lists the supported speaker configurations. The third column of the table lists the expected syntactic elements and their order, using the following notation:

<SCE1>: The single_channel_element (SCE) associated with the front center speaker.
<SCE2>: The SCE associated with the back center speaker.
<CPE1>: The channel_pair_element (CPE) associated with the front speakers.
<CPE2>: The CPE associated with the back (or side) speakers
<LFE>: The lfe_channel_element (LFE).

For more information about these syntactic elements, refer to ISO/IEC 13818-7.

Configuration	Channel Mask	AAC Syntactic Elements
Mono	SPEAKER_FRONT_CENTER	<SCE1>
Stereo or dual mono	SPEAKER_FRONT_LEFT \| SPEAKER_FRONT_RIGHT	<CPE1>
2/1	SPEAKER_FRONT_LEFT \| SPEAKER_FRONT_RIGHT \| SPEAKER_BACK_CENTER	<CPE1><SCE1>
2/2	SPEAKER_FRONT_LEFT \| SPEAKER_FRONT_RIGHT \| SPEAKER_BACK_LEFT \| SPEAKER_BACK_RIGHT	<CPE1><CPE2>
3/0	SPEAKER_FRONT_LEFT \| SPEAKER_FRONT_RIGHT \| SPEAKER_FRONT_CENTER	<SCE1><CPE1>
3/1	SPEAKER_FRONT_LEFT \| SPEAKER_FRONT_RIGHT \| SPEAKER_FRONT_CENTER \| SPEAKER_BACK_CENTER	<SCE1><CPE1><SCE2>
3/2	SPEAKER_FRONT_LEFT \| SPEAKER_FRONT_RIGHT \| SPEAKER_FRONT_CENTER \| SPEAKER_BACK_LEFT \| SPEAKER_BACK_RIGHT	<SCE1><CPE1><CPE2>
3/2 + LFE	SPEAKER_FRONT_LEFT \| SPEAKER_FRONT_RIGHT \| SPEAKER_FRONT_CENTER \| SPEAKER_LOW_FREQUENCY \| SPEAKER_BACK_LEFT \| SPEAKER_BACK_RIGHT	<SCE1><CPE1><CPE2><LFE>

For raw AAC, each input sample must contain exactly one full AAC compressed frame.

For ADTS, each input sample can contain multiple audio frames, as well as partial frames that is, frames can span sample boundaries. Each ADTS header must be followed by one AAC frame.

The AAC decoder does not support any of the following:

Main profile, Sample-Rate Scalable (SRS) profile, or Long Term Prediction (LTP) profile.
Audio data interchange format (ADIF).
LATM/LAOS transport streams.
Coupling channel elements (CCEs). The decoder will skip audio frames with CCEs.
AAC-LC with a 960-sample frame size. Only 1024-sample frames are supported.

Transform Attributes

The AAC decoder implements the IMFTransform::GetAttributes method. Applications can use this method to get or set the following attributes.

Attribute	Description
CODECAPI_AVDecAudioDualMono	Specifies whether 2-channel audio is encoded as stereo or dual mono. Treat as read-only.
CODECAPI_AVDecAudioDualMonoReproMode	Specifies how the decoder reproduces dual mono audio. The default value is eAVDecAudioDualMonoReproMode_LEFT_MONO: Output Ch1 to the left and right speakers. Applications can set this property to change the default behavior.
MFT_SUPPORT_DYNAMIC_FORMAT_CHANGE	The AAC decoder does not handle dynamic format changes, and must be flushed or drained before a new input media type is set. Treat this attribute as read-only. Note: The AAC decoder incorrectly reports a value of TRUE for this attribute. In Windows 7, the decoder incorrectly reports a value of TRUE for this attribute. In Windows 8, the decoder reports FALSE, which is the correct value

Example Media Types

Here is an example of the input media type needed for a 6-channel, 48-kHz AAC-LC stream, using a raw AAC payload:

Attribute	Value
MF_MT_MAJOR_TYPE	MFMediaType_Audio
MF_MT_SUBTYPE	MFAudioFormat_AAC
MF_MT_AUDIO_SAMPLES_PER_SECOND	48000
MF_MT_AUDIO_NUM_CHANNELS	6
MF_MT_AAC_PAYLOAD_TYPE	0
MF_MT_USER_DATA	{0x00, 0x00, 0x2a, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x11, 0xb0}
MF_MT_AAC_AUDIO_PROFILE_LEVEL_INDICATION	0x2a (optional)

The first 12 bytes of MF_MT_USER_DATA correspond to the following HEAACWAVEINFO structure members:

wPayloadType = 0 (raw AAC)
wAudioProfileLevelIndication = 0x2a (AAC Profile, Level 4)
wStructType = 0

The last two bytes of MF_MT_USER_DATA contain the value of AudioSpecificConfig(), as defined by MPEG-4.

AudioSpecificConfig.audioObjectType = 2 (AAC LC) (5 bits)
AudioSpecificConfig.samplingFrequencyIndex = 3 (4 bits)
AudioSpecificConfig.channelConfiguration = 6 (4 bits)
GASpecificConfig.frameLengthFlag = 0 (1 bit)
GASpecificConfig.dependsOnCoreCoder = 0 (1 bit)
GASpecificConfig.extensionFlag = 0 (1 bit)

Given this input type, use the following output media type to get 6-channel, 32-bit floating point PCM audio from the decoder:

Attribute	Value
MF_MT_MAJOR_TYPE	MFMediaType_Audio
MF_MT_SUBTYPE	MFAudioFormat_Float
MF_MT_AUDIO_BITS_PER_SAMPLE	32
MF_MT_AUDIO_SAMPLES_PER_SECOND	48000
MF_MT_AUDIO_NUM_CHANNELS	6
MF_MT_AUDIO_AVG_BYTES_PER_SECOND	1152000 (optional)
MF_MT_AUDIO_BLOCK_ALIGNMENT	24 (optional)
MF_MT_AUDIO_CHANNEL_MASK	0x3f (optional)

If Platform Update Supplement for Windows Vista is installed, the AAC audio decoder is available on Windows Vista, but is accessible on Windows Vista only by using the Source Reader.

Requirements

Requirement	Value
Minimum supported client	Windows 7 [desktop apps only]
Minimum supported server	Windows Server 2008 R2 [desktop apps only]
DLL	Msmpeg2adec.dll on Windows 7; MSAudDecMFT.dll on Windows 8

Aracılığıyla paylaş