Implementing MediaStream Sources

Microsoft Silverlight will reach end of support after October 2021. Learn more.

This topic describes how to create parsers for container formats and describes delivery mechanisms not natively supported by Silverlight.

Opening Media

When a MediaStreamSource object is passed to the MediaElement.SetSource method, the MediaElement changes its state to Opening and calls the MediaStreamSource.OpenMediaAsync method. The MediaStreamSource implementation should respond by calling ReportOpenMediaCompleted once it can describe the media to Silverlight.

For the media, the following information must be included in the description:

  • The duration.

  • Whether or not the media supports seeking.

  • For protected content, the DRM metadata needed to locate an appropriate license.

The description will also contain information about one or more streams. Each stream’s description must include the following information:

  • The identity of the codec.

  • A set of bytes, called the codec private data, to initialize the codec.

If the media has a video stream, the description of the stream must also include the following:

  • Width and height of the original encoded images.

This information is passed to Silverlight by the MediaStreamSource.ReportOpenMediaCompleted method. The ReportOpenMediaCompleted method takes the following parameters:

  • A dictionary of attributes and values describing the media.

  • A collection of MediaStreamDescription objects for both audio and video. Each object is created with a dictionary of attributes and values conveying the above stream information.

Video CodecPrivateData

Video codecs are identified by a four character code stored with MediaStreamAttributeKeys.VideoFourCC. Codec initialization bytes are stored with MediaStreamAttributeKeys.CodecPrivateData as a base16 encoded string. The per-codec details are as follows:

H.264

VideoFourCC

"H264"

CodecPrivateData

A base16-encoded string of the form:

0x00000001 SequenceParameterSet 0x00000001 PictureParameterSet

See ISO/IEC-14496-10 for details on Start Codes, Sequence Parameter Set (SPS) and Picture Parameter Set (PPS) formats.

CodecPrivateDate Example:

Example for 640x360 @ 1Mbps

"00000001674D401E965201405FF2E020100000000168EF3880"

VC-1: Advanced profile

VideoFourCC

"WVC1"

CodecPrivateData

A base16-encoded string of the form:

ASFBindingByte 0x0000010F SequenceLayer 0x0000010E EntryPointLayer

SequenceLayer and EntryPointLayer are in encapsulated (byte-stuffed) form. See VC-1 for details on Start Codecs Suffixes, Sequence Layer and Entry Point Layer formats. See ASF for details on the ASFBinding byte format.

CodecPrivateDate Example:

Example for 1280x720 @ 2.436Mbps

"250000010FD3BE27F1678A27F859E80450824A56DCECC00000010E5A67F840"

VC-1: Main and Simple profiles

VideoFourCC

"WMV3"

CodecPrivateData

A base16-encoded string of the form:

0x0000010F SequenceLayer

SequenceLayer is in encapsulated (byte-stuffed) form. See VC-1 for details on Start Codecs Suffixes and Sequence Layer formats.

MPEG-4 Part 2: Simple & Advanced profiles

VideoFourCC

"MP43"

CodecPrivateData

No CodecPrivateData needed

Windows Media Video 7 through 9

VideoFourCC

"WMVA"

See VC-1 advanced profile.

VideoFourCC

"WMV2"

See VC-1 main and simple profile.

VideoFourCC

"WMV1"

See VC-1 main and simple profile.

RGBA

VideoFourCC

"RGBA"

CodecPrivateData

No CodecPrivateData needed.

YV12

VideoFourCC

"YV12"

CodecPrivateData

No CodecPrivateData needed

NoteNote:

For Windows developers: MSDN documentation on Windows codecs may refer to the VIDEOINFOHEADER, VIDEOINFOHEADER2, MPEG2VIDEOINFO or similar structures. The CodecPrivateData used by Silverlight corresponds to the variable length data appended to the end of those.

Audio CodecPrivateData

Audio codecs are identified and initialized using a WAVEFORMATEX structure. In some cases, additional data follows the WAVEFORMATEX structure. In other cases, the WAVEFORMATEX structure is part of a larger structure. In all cases, the structure and data that follows it are together converted into a little-endian base16-encoded string and stored with MediaStreamAttributeKeys.CodecPrivateData.

The first 16 bits of the CodecPrivateData always corresponds to the first member of the WAVEFORMATEX structure, wFormatTag. This value identifies the codec and informs Silverlight how the remaining data is to be interpreted. The Windows Platform SDK header file mmreg.h contains many of the enumeration and structures described below.

Per-codec details are as follows:

AAC (ISO Advanced Audio Coding)

The following is the AAC recommended format for desktop Silverlight and Silverlight for Windows Phone:

wfx.wFormatTag

0xFF (WAVE_FORMAT_RAW_AAC1)

CodecPrivateData

WAVEFORMATEX structure followed by AudioSpecificConfig data.

The AudioSpecificConfig structure is described in ISO/IEC-14496-3

The following example shows the CodecPrivateData for HE-AAC v2 at 48kHz stereo content (SBR and PS enabled).

"FF000100C05D0000E02E0000040010000500130856E598"

The following example shows the CodecPrivateData for AAC-LC at 48kHz stereo 96kbps:

"FF00020080BB0000E02E00000400100002001190"

The following is the AAC recommended format for Silverlight for Windows Phone:

wfx.wFormatTag

0x1610 (WAVE_FORMAT_MPEG_HEAAC)

CodecPrivateData

HEAACWAVEFORMAT structure followed by AudioSpecificConfig data

The AudioSpecificConfig structure is described in ISO/IEC-14496-3

The following example shows the CodecPrivateData for HE-AAC v2 at 48kHz stereo content (SBR and PS enabled):

"10160100C05D0000E02E00000400100011000000FE000000000000000000130856E598"

The following is a deprecated AAC format:

wfx.wFormatTag

0x1601 (WAVE_FORMAT_MPEG_RAW_AAC)

CodecPrivateData

WAVEFORMATEX structure

With regards to HE-ACC, the nSamplesPerSecond field in WAVEFORMATEX should be pre-SBR (Spectral Band Replication) and the nChannels field in WAVEFORMATEX should be pre-PS (Parametric Stereo).

WMA (Windows Media Audio)

The following table shows the codec information for Windows Media Audio V3.

wfx.wFormatTag

0x162 (WAVE_FORMAT_WMAUDIO3)

CodecPrivateData

WMAUDIO3WAVEFORMAT structure

The following example shows a Windows Media Audio V3 CodecPrivateData for 44.1kHz stereo:

"6201020044AC0000451F0000CF05100012001000030000000000000000000000E00042C0"

The following table shows the codec information for Windows Media Audio V2.

wfx.wFormatTag

0x161 (WAVE_FORMAT_WMAUDIO2)

CodecPrivateData

WMAUDIO2WAVEFORMAT structure

The following table shows the codec information for Windows Media Audio V1.

wfx.wFormatTag

0x160 (WAVE_FORMAT_MSAUDIO1)

CodecPrivateData

MSAUDIO1WAVEFORMAT structure

MP3 (ISO MPEG-1 Layer III)

wfx.wFormatTag

0x55 (WAVE_FORMAT_MPEGLAYER3)

CodecPrivateData

MPEGLAYER3WAVEFORMAT structure

See the ISO/IEC 13818-3 for details on MP3.

The following example shows a MP3 (ISO MPEG-1 Layer III) CodecPrivateData at 22.05kHz stereo 80kbps with 522 bytes per frame, one frame per block:

"550002002256000010270000010000000C000100000000000A0201000000"

PCM / WAV

wfx.wFormatTag

1 (WAVE_FORMAT_PCM)

CodecPrivateData

WAVEFORMATEX structure with no codec-specific data.

The following example shows a PCM / WAV CodecPrivateData at 44.1kHz stereo at 96kbps:

"0100020080BB0000E02E0000040010000000"
NoteNote:

For IIS developers: In the IIS manifest documentation for the 2.0 version of IIS Smooth Streaming manifest format, the WAVEFORMATEX portion of CodecPrivateData is omitted, and is instead programmatically constructed from XML attributes.

State Management

Media States

While a MediaElement has an explicit state, defined by MediaElement.CurrentState, the MediaStreamSource object does not. Rather, the state of the MediaStreamSource is determined by which methods have been invoked. For example, if the SeekAsync method of a MediaStreamSource has been called and it has not yet responded by calling ReportSeekCompleted, it is considered to be in a "seeking" state. In general, a MediaStreamSource is usually performing one of the following activities:

Stream States

A MediaStreamSource implementation might find it useful to maintain per-stream state as well. For example, an application might stop downloading audio if its audio buffer queue is full.

Synchronization

The MediaElement will only call one MediaStreamSource method at once. Therefore, a SeekAsync request will not arrive at the same time as a GetSampleAsync request, nor will two GetSampleAsync requests arrive concurrently. However, there are still some cases a MediaStreamSource developer must consider:

Seeking

  • In desktop Silverlight, a SeekAsync request will not be delivered until all outstanding sample requests have been completed. One solution is to report the previous sample a second time.

  • In Silverlight for Windows Phone, a SeekAsync request will be delivered if an outstanding sample request is pending. However, when that sample is completed, Silverlight for Windows Phone will erroneously throw it away. If that sample is a key frame, it may be necessary to report the sample a second time.

Buffering transitions

Buffer Management

One of the most challenging tasks in creating a MediaStreamSource is implementing buffer management logic. Poor logic can result in excessive entries into the Buffering state and post-starvation audio/video (AV) sync issues.

A MediaStreamSource implementation must take into account two buffers:

  1. The buffer maintained by the pipeline. This buffer consists of all the samples reported by the MediaStreamSource that have not yet been rendered.

  2. The buffer maintained by the MediaStreamSource implementation itself. This buffer may be especially deep when the MediaElement is in the Buffering state; for instance, when the pipeline is not draining samples.

A MediaStreamSource implementation might look the following illustration:

MediaStreamSource implementation

During normal playback:

  1. Media bytes arrive from the network. These bytes are appended to a XAP-managed media buffer, which may contain several seconds of content.

  2. The XAP parses out individual audio and video samples, and adds buffer references to an audio or video queue.

  3. The XAP responds to Silverlight sample requests, transferring samples from its media buffer to Silverlight’s pipeline buffer. The XAP buffer no longer needs to remember that sample.

  4. The sample is presented to the user. The timing is specified by the sample’s timestamp.

Pipeline Buffers

The sizes of the Silverlight pipeline buffers are influenced by the MediaStreamSource.AudioBufferLength property, which is measured in milliseconds. The MediaElement.BufferingTime is ignored for MediaStreamSource classes. Developers can either use the default value for AudioBufferLength, which is 1000 (one second), or set the property to a different value prior to calling ReportOpenMediaCompleted.

How the AudioBufferLength property is interpreted differs between platform:

  • In desktop Silverlight, the buffer depth of the audio stream, as measured in time, is set by AudioBufferLength.

  • In Silverlight for Windows Phone, the audio and video buffer depths are influenced by AudioBufferLength, but are sized in bytes assuming poor compression ratios. As such, the effective buffer depth can be much deeper in practice.

The most accurate and reliable way to calculate the current depth of a pipeline stream buffer is to subtract MediaElement.Position from the timestamp of the last reported sample for that stream, and then add the duration of the sample.

Starvation

Starvation occurs when:

How this situation is handled depends on the platform:

  • In desktop Silverlight, the MediaElement remains in the Playing state, but the Position stops advancing.

  • In Silverlight for Windows Phone, the MediaElement remains in the Playing state, but the Position continues to advance. When samples arrive, they may be discarded if the position has advanced past their timestamps. In some circumstances, AV sync issues may arise when playback continues. This behavior may be changed to match desktop Silverlight in the subsequent release.

A successful MediaStreamSource implementation will ensure the Silverlight pipeline never starves. This is accomplished by monitoring the depth of each pipeline buffer. If a video or audio pipeline buffer is about to be empty, a MediaStreamSource implementation should perform the following actions:

After the media buffer of the XAP is rebuilt, a MediaStreamSource implementation should perform the following actions: