Using the Voice Capture DSP

Artículo
04/12/2010

This topic outlines the steps to using the Voice Capture digital signal processor (DSP) as a DirectX Media Object (DMO).

1. Create the DMO

Create the voice capture DMO by calling CoCreateInstance with the CLSID CLSID_CWMAudioAEC. The voice capture DSDP exposes only the IMediaObject and IPropertyStore interfaces, so it can only be used as a DMO.

2. Select Source Mode or Filter Mode

The DMO defaults to source mode. To select filter mode, use the IPropertyStore interface to set the MFPKEY_WMAAECMA_DMO_SOURCE_MODE property to VARIANT_FALSE.

3. Set the Processing Parameters

Next, configure the internal properties of the DMO by using the IPropertyStore interface. The only property that an application must set is the MFPKEY_WMAAECMA_SYSTEM_MODE property. This property configures the processing pipeline within the DMO. The other properties are optional.

The following properties are supported by the DMO.

Property	Description
MFPKEY_WMAAECMA_DEVICE_INDEXES	Specifies which audio devices the DMO uses for capturing and rendering audio.
MFPKEY_WMAAECMA_DEVICEPAIR_GUID	Identifies the combination of audio devices that the application is currently using.
MFPKEY_WMAAECMA_DMO_SOURCE_MODE	Specifies whether the DMO uses source mode or filter mode.
MFPKEY_WMAAECMA_FEATR_AES	Specifies how many times the DMO performs acoustic echo suppression (AES) on the residual signal.
MFPKEY_WMAAECMA_FEATR_AGC	Specifies whether the DMO performs automatic gain control.
MFPKEY_WMAAECMA_FEATR_CENTER_CLIP	Specifies whether the DMO performs center clipping.
MFPKEY_WMAAECMA_FEATR_ECHO_LENGTH	Specifies the duration of echo that the acoustic echo cancellation (AEC) algorithm can handle.
MFPKEY_WMAAECMA_FEATR_FRAME_SIZE	Specifies the audio frame size.
MFPKEY_WMAAECMA_FEATR_MICARR_BEAM	Specifies which beam the DMO uses for microphone array processing.
MFPKEY_WMAAECMA_FEATR_MICARR_MODE	Specifies how the DMO performs microphone array processing.
MFPKEY_WMAAECMA_FEATR_MICARR_PREPROC	Specifies whether the DMO performs microphone array preprocessing.
MFPKEY_WMAAECMA_FEATR_NOISE_FILL	Specifies whether the DMO performs noise filling.
MFPKEY_WMAAECMA_FEATR_NS	Specifies whether the DMO performs noise suppression.
MFPKEY_WMAAECMA_FEATR_VAD	Specifies the type of voice activity detection that the DMO performs.
MFPKEY_WMAAECMA_FEATURE_MODE	Enables the application to override the default settings on various properties.
MFPKEY_WMAAECMA_MIC_GAIN_BOUNDER	Specifies whether the DMO applies microphone gain bounding.
MFPKEY_WMAAECMA_MICARRAY_DESCPTR	Specifies the microphone array geometry.
MFPKEY_WMAAECMA_QUALITY_METRICS	Retrieves quality metrics for AEC.
MFPKEY_WMAAECMA_RETRIEVE_TS_STATS	Specifies whether the DMO stores time stamp statistics in the registry.
MFPKEY_WMAAECMA_SYSTEM_MODE	Sets the processing mode.

4. Set the Input and Output Formats

If you are using the DMO in filter mode, set the input format by calling IMediaObject::SetInputType. The input format can be almost any valid uncompressed PCM or IEEE floating-point audio type. If the input format does not match the output format, the DMO automatically performs sample-rate conversion.

If you are using the DMO in source mode, do not set the input format. The DMO automatically configures the input format based on the audio devices.

In either mode, set the output format by calling IMediaObject::SetOutputType. The DMO can accept the following output formats:

Subtype: MEDIASUBTYPE_PCM or MEDIASUBTYPE_IEEE_FLOAT
Format block: WAVEFORMAT or WAVEFORMATEX
Samples per second: 8,000; 11,025; 16,000; or 22,050
Channels: 1 for AEC-only mode, 2 or 4 for microphone array processing
Bits per sample: 16

The following code sets the output type to 16-bit single-channel PCM audio:

DMO_MEDIA_TYPE mt;  // Media type.
mt.majortype = MEDIATYPE_Audio;
mt.subtype = MEDIASUBTYPE_PCM;
mt.lSampleSize = 0;
mt.bFixedSizeSamples = TRUE;
mt.bTemporalCompression = FALSE;
mt.formattype = FORMAT_WaveFormatEx;

// Allocate the format block to hold the WAVEFORMATEX structure.
hr = MoInitMediaType(&mt, sizeof(WAVEFORMATEX));
if (SUCCEEDED(hr))
{
    WAVEFORMATEX *pwav = (WAVEFORMATEX*)mt.pbFormat;
    pwav->wFormatTag = WAVE_FORMAT_PCM;
    pwav->nChannels = 1;
    pwav->nSamplesPerSec = 16000;
    pwav->nAvgBytesPerSec = 32000;
    pwav->nBlockAlign = 2;
    pwav->wBitsPerSample = 16;
    pwav->cbSize = 0;

    // Set the output type.
    if (SUCCEEDED(hr))
    {
        hr = pDMO->SetOutputType(0, &mt, 0); 
    }
    // Free the format block.
    MoFreeMediaType(&mt);
}

5. Process Data

Before processing any data, your application should call IMediaObject::AllocateStreamingResources. This method allocates the resources used internally by the DMO. Call AllocateStreamingResources after the steps listed previously, not before. (However, if you do not call this method, the DMO automatically allocates resources on the first call to ProcessInput or ProcessOutput.)

If you are using the DMO in filter mode, you must pass input data to the DMO by calling IMediaBuffer::ProcessInput. The audio data from the microphone goes to stream 0, and the audio data from the speaker line goes to stream 1. If you are using the DMO in source mode, you do not need to call ProcessInput.

To get output data from the DSP, perform the following steps:

Create a buffer object to hold the output data. The buffer object must implement the IMediaBuffer interface. The size of the buffer depends on the requirements of your application. Allocating a larger buffer can reduce the chances of glitches occurring.
Declare a DMO_OUTPUT_DATA_BUFFER structure and set the pBuffer member to point to your buffer object.
Pass the DMO_OUTPUT_DATA_BUFFER structure to the IMediaObject::ProcessOutput method.
Continue to call this method for as long as the DMO has output data. The DSP signals that it has more output by setting the DMO_OUTPUT_DATA_BUFFERF_INCOMPLETE flag in the dwStatus member of the DMO_OUTPUT_DATA_BUFFER structure.

Compartir a través de

Using the Voice Capture DSP

See Also

Recursos adicionales