AAC Decoder
The Microsoft Media Foundation AAC decoder is a Media Foundation Transform that decodes the following Advanced Audio Coding (AAC) and High Efficiency AAC (HE-AAC) profiles:
- MPEG-2 AAC Low Complexity (LC) profile (multichannel).
- MPEG-4 HE-AAC v1 (multichannel) with AAC-LC core.
- MPEG-4 HE-AAC v2 (stereo) with AAC-LC core.
The AAC decoder supports both raw AAC streams with no headers and AAC in an audio data transport stream (ADTS).
Starting in Windows 8, the AAC decoder also supports decoding MPEG-4 audio transport streams with a multiplex layer (LATM) and synchronization layer (LOAS). It can also convert an LATM/LOAS stream to ADTS.
Class Identifier
The class identifier (CLSID) of the AAC encoder is CLSID_CMSAACDecMFT, defined in the header file wmcodecdsp.h.
Media Types
The AAC decoder supports the following media types.
Input Types
The AAC decoder supports the following audio subtypes:
Subtype | Description | Header |
---|---|---|
MFAudioFormat_AAC | Raw AAC or ADTS AAC. For this subtype, the media type gives the sample rate and number of channels prior to the application of spectral band replication (SBR) and parametric stereo (PS) tools, if present. The effect of the SBR tool is to double the decoded sample rate relative to the core AAC-LC sample rate. The effect of the PS tool is to decode stereo from a mono-channel core AAC-LC stream. This subtype is equivalent to MEDIASUBTYPE_MPEG_HEAAC, defined in wmcodecdsp.h. See Audio Subtype GUIDs. The MPEG-4 File Source and the ADTS Parser output this subtype. |
mfapi.h |
MEDIASUBTYPE_RAW_AAC1 | Raw AAC. This subtype is used for AAC contained in an AVI file with the audio format tag equal to WAVE_FORMAT_RAW_AAC1 (0x00FF). For this subtype, the media type gives the sample rate and number of channels after the SBR and PS tools are applied, if present. |
wmcodecdsp.h |
To configure the AAC decoder, set the following attributes on the input media type.
Attribute | Description | Remarks |
---|---|---|
MF_MT_MAJOR_TYPE | Major type. | Must be MFMediaType_Audio. |
MF_MT_SUBTYPE | Audio subtype. | Refer to the previous description for details. |
MF_MT_AAC_AUDIO_PROFILE_LEVEL_INDICATION | Audio profile and level. |
Optional. Applies only to MFAudioFormat_AAC. The value of this attribute is the audioProfileLevelIndication field, as defined by ISO/IEC 14496-3. If unknown, set to zero or 0xFE ("no audio profile specified"). |
MF_MT_AAC_PAYLOAD_TYPE | Payload type. |
Applies only to MFAudioFormat_AAC. The decoder supports the following payload types:
|
MF_MT_AUDIO_BITS_PER_SAMPLE | Desired bit depth of the decoded PCM audio. | |
MF_MT_AUDIO_CHANNEL_MASK | Specifies the assignment of audio channels to speaker positions. | Optional. For more information, see Format Constraints. |
MF_MT_AUDIO_NUM_CHANNELS | Number of channels, including the low frequency (LFE) channel, if present. |
The interpretation of this value depends on the media subtype, as described previously. |
MF_MT_AUDIO_SAMPLES_PER_SECOND | Sample rate, in samples per second. |
The interpretation of this value depends on the media subtype, as described previously. |
MF_MT_USER_DATA | Additional format information. | The value of this attribute depends on the subtype.
The value of audioObjectType as defined in AudioSpecificConfig() must be 2, indicating AAC-LC. The value of extensionAudioObjectType must be 5 for SBR or 29 for PS. |
Output Types
The decoder supports the following output types:
Subtype | Description |
---|---|
MFAudioFormat_Float | IEEE floating-point audio. |
MFAudioFormat_PCM | 16-bit PCM audio. |
MFAudioFormat_AAC | Requires Windows 8. This output type can be used to convert an AAC stream in the LOAS/LATM format to ADTS format. To convert an LOAS/LATM stream to an ADTS stream, set the input type to MFAudioFormat_AAC with payload type 3 (LOAS). Then set the output type to MFAudioFormat_AAC with payload type 1 (ADTS). The decoder will reformat the conainter without decoding the bitstream. Note: The decoder does not register MFAudioFormat_AAC as an output type. However, if the application sets the input type as described, the IMFTransform::GetOutputAvailableType method returns MFAudioFormat_AAC in the list of available output types. |
If the input stream contains more than two channels, the AAC decoder provides two options for the output format:
- The same channel configuration as the input type.
- Stereo fold-down.
Format Constraints
The decoded audio sampling rate must be one of the following, after SBR is applied (if present):
- 8 kHz
- 11.025 kHz
- 12 kHz
- 16 kHz
- 22.05 kHz
- 24 kHz
- 32 kHz
- 44.1 kHz
- 48 kHz
Sampling rates above 48 kHz are not supported.
The decoder supports up to 6 audio channels. For each speaker configuration, the decoder expects the AAC syntactic elements to appear in a certain order. The following table lists the supported speaker configurations. The third column of the table lists the expected syntactic elements and their order, using the following notation:
- <SCE1>: The single_channel_element (SCE) associated with the front center speaker.
- <SCE2>: The SCE associated with the back center speaker.
- <CPE1>: The channel_pair_element (CPE) associated with the front speakers.
- <CPE2>: The CPE associated with the back (or side) speakers
- <LFE>: The lfe_channel_element (LFE).
For more information about these syntactic elements, refer to ISO/IEC 13818-7.
Configuration | Channel Mask | AAC Syntactic Elements |
---|---|---|
Mono | SPEAKER_FRONT_CENTER | <SCE1> |
Stereo or dual mono | SPEAKER_FRONT_LEFT | SPEAKER_FRONT_RIGHT | <CPE1> |
2/1 | SPEAKER_FRONT_LEFT | SPEAKER_FRONT_RIGHT | SPEAKER_BACK_CENTER | <CPE1><SCE1> |
2/2 | SPEAKER_FRONT_LEFT | SPEAKER_FRONT_RIGHT | SPEAKER_BACK_LEFT | SPEAKER_BACK_RIGHT | <CPE1><CPE2> |
3/0 | SPEAKER_FRONT_LEFT | SPEAKER_FRONT_RIGHT | SPEAKER_FRONT_CENTER | <SCE1><CPE1> |
3/1 | SPEAKER_FRONT_LEFT | SPEAKER_FRONT_RIGHT | SPEAKER_FRONT_CENTER | SPEAKER_BACK_CENTER | <SCE1><CPE1><SCE2> |
3/2 | SPEAKER_FRONT_LEFT | SPEAKER_FRONT_RIGHT | SPEAKER_FRONT_CENTER | SPEAKER_BACK_LEFT | SPEAKER_BACK_RIGHT | <SCE1><CPE1><CPE2> |
3/2 + LFE | SPEAKER_FRONT_LEFT | SPEAKER_FRONT_RIGHT | SPEAKER_FRONT_CENTER | SPEAKER_LOW_FREQUENCY | SPEAKER_BACK_LEFT | SPEAKER_BACK_RIGHT | <SCE1><CPE1><CPE2><LFE> |
For raw AAC, each input sample must contain exactly one full AAC compressed frame.
For ADTS, each input sample can contain multiple audio frames, as well as partial frames that is, frames can span sample boundaries. Each ADTS header must be followed by one AAC frame.
The AAC decoder does not support any of the following:
- Main profile, Sample-Rate Scalable (SRS) profile, or Long Term Prediction (LTP) profile.
- Audio data interchange format (ADIF).
- LATM/LAOS transport streams.
- Coupling channel elements (CCEs). The decoder will skip audio frames with CCEs.
- AAC-LC with a 960-sample frame size. Only 1024-sample frames are supported.
Transform Attributes
The AAC decoder implements the IMFTransform::GetAttributes method. Applications can use this method to get or set the following attributes.
Attribute | Description |
---|---|
CODECAPI_AVDecAudioDualMono | Specifies whether 2-channel audio is encoded as stereo or dual mono. Treat as read-only. |
CODECAPI_AVDecAudioDualMonoReproMode | Specifies how the decoder reproduces dual mono audio. The default value is eAVDecAudioDualMonoReproMode_LEFT_MONO: Output Ch1 to the left and right speakers. Applications can set this property to change the default behavior. |
MFT_SUPPORT_DYNAMIC_FORMAT_CHANGE | The AAC decoder does not handle dynamic format changes, and must be flushed or drained before a new input media type is set. Treat this attribute as read-only. Note: The AAC decoder incorrectly reports a value of TRUE for this attribute. In Windows 7, the decoder incorrectly reports a value of TRUE for this attribute. In Windows 8, the decoder reports FALSE, which is the correct value |
Example Media Types
Here is an example of the input media type needed for a 6-channel, 48-kHz AAC-LC stream, using a raw AAC payload:
Attribute | Value |
---|---|
MF_MT_MAJOR_TYPE | MFMediaType_Audio |
MF_MT_SUBTYPE | MFAudioFormat_AAC |
MF_MT_AUDIO_SAMPLES_PER_SECOND | 48000 |
MF_MT_AUDIO_NUM_CHANNELS | 6 |
MF_MT_AAC_PAYLOAD_TYPE | 0 |
MF_MT_USER_DATA | {0x00, 0x00, 0x2a, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x11, 0xb0} |
MF_MT_AAC_AUDIO_PROFILE_LEVEL_INDICATION | 0x2a (optional) |
The first 12 bytes of MF_MT_USER_DATA correspond to the following HEAACWAVEINFO structure members:
- wPayloadType = 0 (raw AAC)
- wAudioProfileLevelIndication = 0x2a (AAC Profile, Level 4)
- wStructType = 0
The last two bytes of MF_MT_USER_DATA contain the value of AudioSpecificConfig(), as defined by MPEG-4.
- AudioSpecificConfig.audioObjectType = 2 (AAC LC) (5 bits)
- AudioSpecificConfig.samplingFrequencyIndex = 3 (4 bits)
- AudioSpecificConfig.channelConfiguration = 6 (4 bits)
- GASpecificConfig.frameLengthFlag = 0 (1 bit)
- GASpecificConfig.dependsOnCoreCoder = 0 (1 bit)
- GASpecificConfig.extensionFlag = 0 (1 bit)
Given this input type, use the following output media type to get 6-channel, 32-bit floating point PCM audio from the decoder:
Attribute | Value |
---|---|
MF_MT_MAJOR_TYPE | MFMediaType_Audio |
MF_MT_SUBTYPE | MFAudioFormat_Float |
MF_MT_AUDIO_BITS_PER_SAMPLE | 32 |
MF_MT_AUDIO_SAMPLES_PER_SECOND | 48000 |
MF_MT_AUDIO_NUM_CHANNELS | 6 |
MF_MT_AUDIO_AVG_BYTES_PER_SECOND | 1152000 (optional) |
MF_MT_AUDIO_BLOCK_ALIGNMENT | 24 (optional) |
MF_MT_AUDIO_CHANNEL_MASK | 0x3f (optional) |
If Platform Update Supplement for Windows Vista is installed, the AAC audio decoder is available on Windows Vista, but is accessible on Windows Vista only by using the Source Reader.
Requirements
Requirement | Value |
---|---|
Minimum supported client |
Windows 7 [desktop apps only] |
Minimum supported server |
Windows Server 2008 R2 [desktop apps only] |
DLL |
|