How to implement audio streaming for VoIP calls for Windows Phone 8

[ This article is for Windows Phone 8 developers. If you’re developing for Windows 10, see the latest documentation. ]

This topic explains how to play and capture audio for VoIP applications. For an introduction to the structure of a VoIP application, see VoIP apps for Windows Phone 8.

This topic contains the following sections.

Overview of audio capture and render

Windows Phone 8 includes a subset of the Windows Audio Session API (WASAPI) to enable VoIP applications to capture and render audio streams. Your application will use the WASAPI interface IAudioRenderClient for rendering audio and IAudioCaptureClient for capturing audio. These interfaces can be used on the phone in much the same as you would use them in a Windows application. The biggest difference from Windows has to do with the way routing between different audio endpoints is managed. The WASAPI IMMDevice interface is used to manage routing in a desktop application, but this interface is not supported on the phone.

VoIP applications should get the Communications audio device ID by calling GetDefaultAudioCaptureId and GetDefaultAudioRenderId and passing in the AudioDeviceRole.Communications enum value. Once these IDs have been obtained, you can activate the associated interfaces by calling ActivateAudioInterface.

Creating the audio interfaces using AudioDeviceRole.Communications causes them to have some special behaviors on the phone. First, the audio streams using these interfaces are given a higher priority than other audio, such as the audio stream from a music player. But the biggest difference is that the audio endpoint routing can be managed using the AudioRoutingManager class. You call the SetAudioEndpoint(AudioRoutingEndpoint) to request that the audio be routed to the default endpoint, speakerphone, or Bluetooth headset. You can query for the currently active audio end point by calling GetAudioEndpoint(AudioRoutingEndpoint) or by subscribing to the AudioEndpointChanged()()() event.

General purpose apps should use the AudioDeviceRole.Default enum value when requesting audio device IDs with GetDefaultAudioCaptureId and GetDefaultAudioRenderId. The AudioDeviceRole.Communications role is intended for VoIP applications only.


Your application must have the ID_CAP_MICROPHONE in order to capture audio. Your application must have the ID_CAP_VOIP capability and use the Communications audio device role in order to render or capture audio in the background or to mute or pause lower-priority audio streams during playback. For more information on capabilities, see App capabilities and hardware requirements for Windows Phone 8.

The following should also be considered when implementing audio streaming for VoIP applications on Windows Phone 8:

  • The application is responsible for converting received audio data to PCM before rendering.

  • Captured audio is returned in PCM format. The application is responsible for encoding this to any other format if this is needed for network transmission.

  • The application is responsible for implementing any echo cancellation, noise reduction, or gain control in software.

  • There are no built-in Audio/Video sync mechanisms.

  • Applications have access to only a single microphone input stream. Dual microphones for use in noise suppression or other audio processing are not supported.

Supported WASAPI APIs

Windows Phone 8 supports a subset of the APIs exposed by WASAPI. For a list of the APIs that are available to use on the phone, see Audio Capture and Render APIs for native code for Windows Phone.