Audio Stream
Kinect for Windows 1.5, 1.6, 1.7, 1.8
The Kinect sensor includes a four-element, linear microphone array, shown here in purple.
The microphone array captures audio data at a 24-bit resolution, which allows accuracy across a wide dynamic range of voice data, from normal speech at three or more meters to a person yelling.
What Can You Do with Audio?
The sensor (microphone array) enables several user scenarios, such as:
- High-quality audio capture
- Focus on audio coming from a particular direction with beamforming
- Identification of the direction of audio sources
- Improved speech recognition as a result of audio capture and beamforming
- Raw voice data access
Implementing Audio in a Native (Unmanaged) Application
A native application can use one of two different approaches for implementing solutions for these audio scenarios:
- Use the KinectAudio DirectX Media Object (DMO), as shown in the AudioBasics-D2D C++ sample
- Use the Windows Audio Session API (WASAPI), as shown in the AudioCaptureRaw-Console C++ sample
Using the KinectAudio DirectX Media Object (DMO)
Windows Vista, Windows 7, and Windows 8 include a voice-capture digital signal processor (DSP) that supports microphone arrays. Developers typically access that DSP through a DMO, which is a standard COM object that can be incorporated into a DirectShow graph or a Microsoft Media Foundation topology. The SDK includes an extended version of the Windows microphone array DMO, referred to here as the KinectAudio DMO, to support the Kinect microphone array.
Access a DMO in C++ by calling NuiGetAudioSource or INuiSensor::NuiGetAudioSource.
Using the Windows Audio Session API (WASAPI)
Use the Windows Audio Session API (WASAPI) to capture the raw audio stream as shown in the AudioCaptureRaw-Console C++ sample in the Developer Toolkit.
For more information about WASAPI, see About WASAPI (Windows).
Implenting Audio in a Managed Application
Managed applications use a KinectAudioSource object to implement all of the scenarios listed above.
KinectAudioSource Wraps a DirectX Media Object
A Windows DirectX Media Object (DMO) is a common Windows component for a single-channel microphone. Using this as a building block, the KinectAudio class extends this component with the following additional capabilities:
- An additional microphone mode, which is customized to support the Kinect microphone array
- Beamforming and source localization
- Noise suppression and automatic echo cancellation using the 24-bit ADC built into the DMO
Access the audio stream in managed code using the KinectSensor.AudioSource property.