Microsoft eCDN technical overview

Article
12/08/2022

Introduction

Microsoft eCDN operates a WebRTC-based peer-to-peer (P2P) CDN that delivers HLS and MPEG-DASH video streams. No additional software / client plug-in or hardware is needed for the solution to work. All you need is an HTML5-compliant Web browser or Teams Desktop application.

Microsoft eCDN solves the network congestion problem that occurs during large streaming events such as all-hands meetings. If every employee tries to watch the same stream at the same time, the office ISP link becomes saturated. However, when Microsoft eCDN is deployed, an efficient P2P mesh network is formed during these large streaming events, which significantly reduces the load on the ISP link.

Being a 100% standards-based and SaaS-only service also means:

The time it takes to test and deploy Microsoft eCDN is only a few days.
Microsoft eCDN is inherently secure, as it follows all Microsoft O365 security standards, and consists of JavaScript code, which runs in the limited, sandboxed environment of standard Web browsers or the streaming platform's client.

System overview

Microsoft eCDN operates as a service that orchestrates peers while providing analytics and control. The system is designed to be compatible with existing industry standards and technologies. This means it is designed to work with:

HTTP-based streaming protocols such as HLS and MPEG-DASH.
HTML5-based video players (JWPlayer, Video.js, Clappr, Kaltura, etc.) and any native Android or iOS player (ExoPlayer, AVPlayer, etc.)
HTTP-based CDNs: Akamai, Fastly, CloudFront, Cloudflare, Azure CDN, etc.
Streaming servers: Wowza, Nimble, Nginx rtmp module, etc.
DRM technologies: Widevine, PlayReady, FairPlay, etc.
In order to be completely compatible with, yet still able to augment existing technologies and infrastructure, the content delivery model that Microsoft eCDN employs is a hybrid one. That is, each viewer can download resources from both the P2P network and the HTTP network simultaneously.

Overview diagram of eCDN infrastructure.

At a high-level, the eCDN system is composed of:

Peering discovery service: Responsible for peer discovery.
Switchboard: Responsible for creating the initial P2P connections between viewers.
Data pipeline: Consumes all service telemetry and stores it in a data warehouse for analytics consumption.
Player Plugin: Responsible for intercepting and forwarding video-related requests to the Client SDK.
Client SDK: Responsible for intelligently requesting video resources from HTTP / P2P and stitching the data buffers in real-time.
- The Client SDK connects with the backend (Peering discovery service, Switchboard, Data pipeline).
- The discovery service sends the Client SDK a set of peers that it believes will benefit this particular viewer. Peers are selected based on network proximity, cache allocation, stream relevance, among other parameters.
- The Client SDK establishes WebRTC data channel connections with the specified set of peers with the help of the Switchboard.
- HTTP requests that are generated by the video player are intercepted by the Player Plugin and forwarded to the Microsoft eCDN Client SDK, which decides, based on real-time measurements, whether to fetch the desired resource from the P2P network or from HTTP or from both concurrently in order to provide that resource back to the player in the most efficient and timely manner.
- The manifest requests, DRM license and encryption are always retrieved from the HTTP edge server in order to get the most current copy and to adhere to authorization mechanisms.
- Independently, the Client SDK requests authorization to create peer connections from the Microsoft eCDN backend. Once authorized, the Client SDK begins to download resources from HTTP and P2P.

Client logic overview

The client SDK fetches content concurrently from HTTP and P2P sources. This means that the user experience won't be negatively affected by segments that haven't been fetched in time or because the P2P source's connection speed is insufficient.

Security

Microsoft eCDN is compliant with Microsoft O365 security standards.

The service is as secure as any traditional server-based CDN service. Because it is a hybrid solution, one that uses eCDN in combination with a traditional HTTP server, we leverage the existing security infrastructure (tokens, keys, cookies, etc.) that a customer already has in place.

In terms of communication, peers are connected to each other via the WebRTC data channel, which is a secure pipe that uses the SCTP protocol over DTLS encryption. Additionally, each viewer is connected to the backend via a secure Websocket connection that uses TLS encryption. So neither the data sent between viewers nor the metadata sent between each viewer and the backend can be compromised.

In terms of stream security, there are several scenarios:

Authentication on session start

In this case, every session begins with the server asking the viewer for a user ID and password. If these credentials are valid, the server will send the manifest file to the viewer and the video player will start requesting segments and additional manifests from the HTTP server accordingly. Microsoft eCDN doesn't insert itself into the validation process and the viewer must pass through the same authentication gates whether or not Microsoft eCDN is deployed. Only viewers who are authorized for a stream can participate in P2P sharing for that stream and they only share while they're actually watching the stream.

URL timed tokenization

In this case, the manifest URL has an additional token, which encodes a few details about the viewer's user agent (IP address, expiration time, etc.). A malicious user that somehow obtains a manifest URL either through logging in or in other ways can distribute it to unauthorized viewers, but those viewers won't be able to access the stream since the manifest URL is tokenized and the HTTP server will reject any validation attempts, either because of IP address or other user agent mismatches or because of time expiration. With Microsoft eCDN, all manifest requests are sent directly to the HTTP server so validation can't be compromised.

Video segment content protection

Unauthorized users who gain access to the stream URLs may still attempt to access the content of the video segments via other peers. In the case that the segments are unencrypted, the following risk exists: the unauthorized user can receive the URL of a segment from a different user, find other peers that have this relevant resource, and attempt requesting this resource directly from these users (even though the media server / CDN would not allow access to this resource).

When content tokenization is enabled, we ensure that the user is authenticated on the resource level before other peers can send data to that user. This is a granular mechanism that can grant access to certain resources and reject access to other resources on the same session.

Further protection measures include encryption:

Encryption

Let's take, for example, an HLS stream that is protected with AES-128 encryption. Malicious users can send the manifest URL to unauthorized viewers, or even the video segments themselves, but as long as the unauthorized viewers don't have access to the decryption key, they won't be able to watch the stream. The key can be sent to the end user in numerous ways, for example, via the main manifest, or via the HTML page, or some other path. Regardless, the service doesn't insert itself into this process, and the key is delivered to the video player using the same mechanism whether or not the service is deployed, which means the key is just as secure with or without Microsoft eCDN.

DRM

The DRM use-case resembles the encryption use-case. The only difference is that the license and keys are distributed by the DRM mechanism instead of by the broadcaster. Here as well, Microsoft eCDN doesn't interfere with the distribution of the license or keys and thus doesn't compromise them.