Implementing MediaStream Sources
Microsoft Silverlight will reach end of support after October 2021. Learn more.
This topic describes how to create parsers for container formats and describes delivery mechanisms not natively supported by Silverlight.
Opening Media
When a MediaStreamSource object is passed to the MediaElement.SetSource method, the MediaElement changes its state to Opening and calls the MediaStreamSource.OpenMediaAsync method. The MediaStreamSource implementation should respond by calling ReportOpenMediaCompleted once it can describe the media to Silverlight.
For the media, the following information must be included in the description:
The duration.
Whether or not the media supports seeking.
For protected content, the DRM metadata needed to locate an appropriate license.
The description will also contain information about one or more streams. Each stream’s description must include the following information:
The identity of the codec.
A set of bytes, called the codec private data, to initialize the codec.
If the media has a video stream, the description of the stream must also include the following:
- Width and height of the original encoded images.
This information is passed to Silverlight by the MediaStreamSource.ReportOpenMediaCompleted method. The ReportOpenMediaCompleted method takes the following parameters:
A dictionary of attributes and values describing the media.
A collection of MediaStreamDescription objects for both audio and video. Each object is created with a dictionary of attributes and values conveying the above stream information.
Video CodecPrivateData
Video codecs are identified by a four character code stored with MediaStreamAttributeKeys.VideoFourCC. Codec initialization bytes are stored with MediaStreamAttributeKeys.CodecPrivateData as a base16 encoded string. The per-codec details are as follows:
H.264
VideoFourCC |
"H264" |
CodecPrivateData |
A base16-encoded string of the form: 0x00000001 SequenceParameterSet 0x00000001 PictureParameterSet See ISO/IEC-14496-10 for details on Start Codes, Sequence Parameter Set (SPS) and Picture Parameter Set (PPS) formats. |
CodecPrivateDate Example: |
Example for 640x360 @ 1Mbps
|
VC-1: Advanced profile
VideoFourCC |
"WVC1" |
CodecPrivateData |
A base16-encoded string of the form: ASFBindingByte 0x0000010F SequenceLayer 0x0000010E EntryPointLayer SequenceLayer and EntryPointLayer are in encapsulated (byte-stuffed) form. See VC-1 for details on Start Codecs Suffixes, Sequence Layer and Entry Point Layer formats. See ASF for details on the ASFBinding byte format. |
CodecPrivateDate Example: |
Example for 1280x720 @ 2.436Mbps
|
VC-1: Main and Simple profiles
VideoFourCC |
"WMV3" |
CodecPrivateData |
A base16-encoded string of the form: 0x0000010F SequenceLayer SequenceLayer is in encapsulated (byte-stuffed) form. See VC-1 for details on Start Codecs Suffixes and Sequence Layer formats. |
MPEG-4 Part 2: Simple & Advanced profiles
VideoFourCC |
"MP43" |
CodecPrivateData |
No CodecPrivateData needed |
Windows Media Video 7 through 9
VideoFourCC |
"WMVA" See VC-1 advanced profile. |
VideoFourCC |
"WMV2" See VC-1 main and simple profile. |
VideoFourCC |
"WMV1" See VC-1 main and simple profile. |
RGBA
VideoFourCC |
"RGBA" |
CodecPrivateData |
No CodecPrivateData needed. |
YV12
VideoFourCC |
"YV12" |
CodecPrivateData |
No CodecPrivateData needed |
Note: |
---|
For Windows developers: MSDN documentation on Windows codecs may refer to the VIDEOINFOHEADER, VIDEOINFOHEADER2, MPEG2VIDEOINFO or similar structures. The CodecPrivateData used by Silverlight corresponds to the variable length data appended to the end of those. |
Audio CodecPrivateData
Audio codecs are identified and initialized using a WAVEFORMATEX structure. In some cases, additional data follows the WAVEFORMATEX structure. In other cases, the WAVEFORMATEX structure is part of a larger structure. In all cases, the structure and data that follows it are together converted into a little-endian base16-encoded string and stored with MediaStreamAttributeKeys.CodecPrivateData.
The first 16 bits of the CodecPrivateData always corresponds to the first member of the WAVEFORMATEX structure, wFormatTag. This value identifies the codec and informs Silverlight how the remaining data is to be interpreted. The Windows Platform SDK header file mmreg.h contains many of the enumeration and structures described below.
Per-codec details are as follows:
AAC (ISO Advanced Audio Coding)
The following is the AAC recommended format for desktop Silverlight and Silverlight for Windows Phone:
wfx.wFormatTag |
0xFF (WAVE_FORMAT_RAW_AAC1) |
CodecPrivateData |
WAVEFORMATEX structure followed by AudioSpecificConfig data. The AudioSpecificConfig structure is described in ISO/IEC-14496-3 |
The following example shows the CodecPrivateData for HE-AAC v2 at 48kHz stereo content (SBR and PS enabled).
"FF000100C05D0000E02E0000040010000500130856E598"
The following example shows the CodecPrivateData for AAC-LC at 48kHz stereo 96kbps:
"FF00020080BB0000E02E00000400100002001190"
The following is the AAC recommended format for Silverlight for Windows Phone:
wfx.wFormatTag |
0x1610 (WAVE_FORMAT_MPEG_HEAAC) |
CodecPrivateData |
HEAACWAVEFORMAT structure followed by AudioSpecificConfig data The AudioSpecificConfig structure is described in ISO/IEC-14496-3 |
The following example shows the CodecPrivateData for HE-AAC v2 at 48kHz stereo content (SBR and PS enabled):
"10160100C05D0000E02E00000400100011000000FE000000000000000000130856E598"
The following is a deprecated AAC format:
wfx.wFormatTag |
0x1601 (WAVE_FORMAT_MPEG_RAW_AAC) |
CodecPrivateData |
WAVEFORMATEX structure |
With regards to HE-ACC, the nSamplesPerSecond field in WAVEFORMATEX should be pre-SBR (Spectral Band Replication) and the nChannels field in WAVEFORMATEX should be pre-PS (Parametric Stereo).
WMA (Windows Media Audio)
The following table shows the codec information for Windows Media Audio V3.
wfx.wFormatTag |
0x162 (WAVE_FORMAT_WMAUDIO3) |
CodecPrivateData |
WMAUDIO3WAVEFORMAT structure |
The following example shows a Windows Media Audio V3 CodecPrivateData for 44.1kHz stereo:
"6201020044AC0000451F0000CF05100012001000030000000000000000000000E00042C0"
The following table shows the codec information for Windows Media Audio V2.
wfx.wFormatTag |
0x161 (WAVE_FORMAT_WMAUDIO2) |
CodecPrivateData |
WMAUDIO2WAVEFORMAT structure |
The following table shows the codec information for Windows Media Audio V1.
wfx.wFormatTag |
0x160 (WAVE_FORMAT_MSAUDIO1) |
CodecPrivateData |
MSAUDIO1WAVEFORMAT structure |
MP3 (ISO MPEG-1 Layer III)
wfx.wFormatTag |
0x55 (WAVE_FORMAT_MPEGLAYER3) |
CodecPrivateData |
MPEGLAYER3WAVEFORMAT structure |
See the ISO/IEC 13818-3 for details on MP3.
The following example shows a MP3 (ISO MPEG-1 Layer III) CodecPrivateData at 22.05kHz stereo 80kbps with 522 bytes per frame, one frame per block:
"550002002256000010270000010000000C000100000000000A0201000000"
PCM / WAV
wfx.wFormatTag |
1 (WAVE_FORMAT_PCM) |
CodecPrivateData |
WAVEFORMATEX structure with no codec-specific data. |
The following example shows a PCM / WAV CodecPrivateData at 44.1kHz stereo at 96kbps:
"0100020080BB0000E02E0000040010000000"
Note: |
---|
For IIS developers: In the IIS manifest documentation for the 2.0 version of IIS Smooth Streaming manifest format, the WAVEFORMATEX portion of CodecPrivateData is omitted, and is instead programmatically constructed from XML attributes. |
State Management
Media States
While a MediaElement has an explicit state, defined by MediaElement.CurrentState, the MediaStreamSource object does not. Rather, the state of the MediaStreamSource is determined by which methods have been invoked. For example, if the SeekAsync method of a MediaStreamSource has been called and it has not yet responded by calling ReportSeekCompleted, it is considered to be in a "seeking" state. In general, a MediaStreamSource is usually performing one of the following activities:
"opening"
The MediaElement has called OpenMediaAsync, but the MediaStreamSource implementation has not responded with ReportOpenMediaCompleted.
"seeking"
The MediaElement has called SeekAsync, but the MediaStreamSource implementation has not responded with ReportSeekCompleted.
"buffering"
The MediaElement has called GetSampleAsync for one of the media streams, and the MediaStreamSource responded with ReportGetSampleProgress.
"streaming"
The MediaStreamSource has not responded with ReportGetSampleProgress. In other words, the MediaStreamSource will respond to GetSampleAsync calls with calls to ReportGetSampleCompleted as soon as a sample is available.
"mediaEnded"
The MediaStreamSource notified the MediaElement that the final samples were delivered for all streams. This is done by reporting special End-of-Stream samples in response to GetSampleAsync.
"closed"
CloseMedia has been called.
Stream States
A MediaStreamSource implementation might find it useful to maintain per-stream state as well. For example, an application might stop downloading audio if its audio buffer queue is full.
Synchronization
The MediaElement will only call one MediaStreamSource method at once. Therefore, a SeekAsync request will not arrive at the same time as a GetSampleAsync request, nor will two GetSampleAsync requests arrive concurrently. However, there are still some cases a MediaStreamSource developer must consider:
Seeking
In desktop Silverlight, a SeekAsync request will not be delivered until all outstanding sample requests have been completed. One solution is to report the previous sample a second time.
In Silverlight for Windows Phone, a SeekAsync request will be delivered if an outstanding sample request is pending. However, when that sample is completed, Silverlight for Windows Phone will erroneously throw it away. If that sample is a key frame, it may be necessary to report the sample a second time.
Buffering transitions
- In Silverlight for Windows Phone, calling ReportGetSampleProgress will not prevent subsequent GetSampleAsync calls from arriving. Buffering is a media state, not a stream state. Calling ReportGetSampleCompleted on any stream will cause the MediaElement to exit buffering. As such, a MediaStreamSource should not complete a sample on any stream until buffering is completed for all streams.
Buffer Management
One of the most challenging tasks in creating a MediaStreamSource is implementing buffer management logic. Poor logic can result in excessive entries into the Buffering state and post-starvation audio/video (AV) sync issues.
A MediaStreamSource implementation must take into account two buffers:
The buffer maintained by the pipeline. This buffer consists of all the samples reported by the MediaStreamSource that have not yet been rendered.
The buffer maintained by the MediaStreamSource implementation itself. This buffer may be especially deep when the MediaElement is in the Buffering state; for instance, when the pipeline is not draining samples.
A MediaStreamSource implementation might look the following illustration:
During normal playback:
Media bytes arrive from the network. These bytes are appended to a XAP-managed media buffer, which may contain several seconds of content.
The XAP parses out individual audio and video samples, and adds buffer references to an audio or video queue.
The XAP responds to Silverlight sample requests, transferring samples from its media buffer to Silverlight’s pipeline buffer. The XAP buffer no longer needs to remember that sample.
The sample is presented to the user. The timing is specified by the sample’s timestamp.
Pipeline Buffers
The sizes of the Silverlight pipeline buffers are influenced by the MediaStreamSource.AudioBufferLength property, which is measured in milliseconds. The MediaElement.BufferingTime is ignored for MediaStreamSource classes. Developers can either use the default value for AudioBufferLength, which is 1000 (one second), or set the property to a different value prior to calling ReportOpenMediaCompleted.
How the AudioBufferLength property is interpreted differs between platform:
In desktop Silverlight, the buffer depth of the audio stream, as measured in time, is set by AudioBufferLength.
In Silverlight for Windows Phone, the audio and video buffer depths are influenced by AudioBufferLength, but are sized in bytes assuming poor compression ratios. As such, the effective buffer depth can be much deeper in practice.
The most accurate and reliable way to calculate the current depth of a pipeline stream buffer is to subtract MediaElement.Position from the timestamp of the last reported sample for that stream, and then add the duration of the sample.
Starvation
Starvation occurs when:
The MediaElement was not placed in the Buffering state as a result from a call to ReportGetSampleProgress.
All samples in one of the pipeline buffers of the stream have been rendered.
How this situation is handled depends on the platform:
In desktop Silverlight, the MediaElement remains in the Playing state, but the Position stops advancing.
In Silverlight for Windows Phone, the MediaElement remains in the Playing state, but the Position continues to advance. When samples arrive, they may be discarded if the position has advanced past their timestamps. In some circumstances, AV sync issues may arise when playback continues. This behavior may be changed to match desktop Silverlight in the subsequent release.
A successful MediaStreamSource implementation will ensure the Silverlight pipeline never starves. This is accomplished by monitoring the depth of each pipeline buffer. If a video or audio pipeline buffer is about to be empty, a MediaStreamSource implementation should perform the following actions:
Respond to GetSampleAsync requests by calling ReportGetSampleProgress. This updates the level of progress and allows the MediaElement to transition to Buffering.
Stop completing samples on any stream until the media buffer of the XAP is rebuilt.
After the media buffer of the XAP is rebuilt, a MediaStreamSource implementation should perform the following actions:
Call ReportGetSampleProgress with 1.0 to indicate buffering level has reached 100 percent.
Begin completing samples to allow the MediaElement state to transition from Buffering to Playing.