May 2013
Volume 28 Number 05
Windows Azure Insider - Geo-Protection for Video Blobs Using a Node.js Media Proxy
By Bruno Terkaly, Ricardo Villalobos | May 2013
Windows Azure provides three useful options for its cloud storage system: tables, blobs, and queues, all backed up by a redundant and resilient global infrastructure. We looked at tables in our last column (msdn.microsoft.com/magazine/dn166928); now we’ll focus on blobs, in particular on how to resolve certain security issues that can arise when you use them.
Blobs are used to store binary data, and Windows Azure gives you two types to choose from: page blobs, which are used for random data access and can be up to 1TB; and block blobs, which are optimized for uploading and streaming purposes and can contain up to 200GB of data. Blobs are used as the foundation for many services in Windows Azure, including OS-formatted disks and video assets. In terms of performance, the throughput target for each blob is up to 60 MB/s. Figure 1 shows how blob storage is structured, partitioned and accessed.
Figure 1 Windows Azure Blob Storage Concepts
Blobs are exposed to the world via the HTTP and HTTPS protocols and can be either public or private. When a blob is marked as private, a policy-based Shared Access Signature (a URL) is used to grant access to a specific blob, or to any blob within a specified container for a specific period of time. Once the URL for a specific blob is known, however, the only way to prevent access to it is by modifying, expiring or deleting the corresponding security policy.
This condition presents a challenge for some use cases, such as restricting access to video assets when the URL of the blob needs to be publicly shared for streaming purposes. Two common levels of protection are often required: geographical access, which restricts the countries or regions where the video can be played; and referrer access, which restricts the domains or Web sites where the video can be embedded. This is particularly important for media corporations that acquire digital rights to broadcast events (such as the Olympics, or the soccer World Cup) in specific countries, as well as for marketing agencies creating location-based advertising.
In this article, we’ll show you how to create a reverse proxy server in Windows Azure that offers a solution to these security requirements, as shown in Figure 2.
Figure 2 Providing Geo-Protection for Video Assets Using a Reverse Proxy Server
For our solution, we’ll use Node.js, a powerful server-side JavaScript-based platform we discussed in a previous column (see “Real-World Scenarios for Node.js in Windows Azure” at msdn.microsoft.com/magazine/jj991974). The reason for using Node.js is simple: It requires minimal memory and CPU use while supporting thousands of connections per server. And by deploying our application to Windows Azure, we can scale as needed, based on traffic and demand. Here’s the basic flow we’ll follow:
- Capture the original request to embed a video from the Web server that was generated by a remote client.
- Identify the referrer page and validate that the domain is authorized to embed the video.
- Identify the country from which the request originated and verify that it’s authorized.
- If all the criteria are met, stream the video to the client.
- If not all criteria are met, stream a video showing an error message.
We’ll be using the following cloud resources to make our solution work:
- Windows Azure Media Services to generate the video assets to be used in code.
- Windows Azure Tables to store the list of authorized countries and referrers.
- MongoDB (mongodb.org) to host the IP geolocation database.
- Windows Azure Web Sites to host the Node.js reverse proxy and the sample Web page.
Hosting Video Content in Windows Azure
Before you start, you need to upload and encode some video content in the cloud for testing purposes. You can do this easily by signing in to the Windows Azure portal at manage.windowsazure.com and clicking Media Services. (For more information on Media Services, please refer to our June 2012 article, “Democratizing Video Content with Windows Azure Media Services,” at msdn.microsoft.com/magazine/jj133821.) If you don’t have a subscription, you can request a trial account at bit.ly/YCNEd3. For this example, we uploaded and encoded three videos, as shown in Figure 3.
Figure 3 Video Assets Used in the Example
The Publish URL link associated with each video allows them to be played and embedded in other pages. Our proxy server will use these links to stream the media content to viewers around the world, after validating the location and page from which the request originated.
Storing Validation Information in Windows Azure Table Storage
The next step is to create a few tables in Windows Azure Storage that will help with the validation process. As we explained in our previous column, these tables are based strictly on key-value pairs, but they serve our purpose for this project. Figure 4 describes the structure of the tables.
Figure 4 Storage Table Structure for the Video Content
proxycountries | PartitionKey: country abbreviation RowKey: “true” or “false” (for access) |
proxyreferrers | PartitionKey: domain where the video is hosted RowKey: “true” or “false” (for access) |
proxyvideos | PartitionKey: friendly name for the video asset RowKey: encoding format URL: published URL for the video asset in Media Services |
proxyrejects | PartitionKey: “error” or “reject” RowKey: category for the error or reject Description: details about the error or reject |
You can create these tables in Windows Azure Storage using one of the free tools available for download. We particularly like Azure Storage Explorer from Neudesic, which can be downloaded from CodePlex at bit.ly/H3rOC. For basic functionality, you’ll need to insert at least one entity into the proxycountries and proxyreferrers tables, with PartitionKey=“undefined” and RowKey=“true.” This lets you test your Node.js media proxy locally. In last month’s article, we discussed the importance of selecting the correct partition and row keys for the best query performance.
Preparing the Geolocation Database
There are a few companies offering databases and services for geolocation purposes. MaxMind is one of them, and because the company provides a GeoLite version that can be used under the Creative Commons license, we decided to include it in our project. The CSV file can be downloaded from bit.ly/W5Z7qA. This database allows us to identify the country where the video request is coming from, based on the IP address.
Our next decision involved where to host this database in the cloud. Because we’ll need to perform a range search on this table (not natively supported by Windows Azure Table Storage), we opted to use MongoDB, an open source document-oriented (JSON) database engine developed and supported by 10gen. MongoDB supports multiple indexes and complex queries, which is ideal for our solution. The good news is that there are a few companies offering this database as a service, including MongoLab, available in the Windows Azure Store. You can also sign up for an account at mongolab.com, selecting Windows Azure as your hosting provider. Make sure to select the same datacenter where you created the Windows Azure Storage tables.
When you’ve created your MongoDB database, you can access it using a URL similar to this:
mongodb://{username}:{password}!@{server_name}.mongolab.com:{port_number}/{database_name}
In order to import the MaxMind CSV file into your database, simply download the MongoDB tools from mongodb.org/downloads. Once you’ve installed them on your computer, run the following command:
mongoimport -h {servername}.mongolab.com:{port_number} -d {database_name} -c {collection_name} -u {username} -p {password} --file {MaxMind CSV file} --type csv --headerline
Now you can run queries against the geolocation database.
Reviewing the Source Code
If you haven’t already, please download the source code for this article from msdn.com/magazine/msdnmag0513. The code consists of three different files: server.js, config.json and package.json.
The main part of the code is in server.js, where you’ll see a few modules defined in the first lines (they’ll be automatically downloaded and installed later from npmjs.org):
- Request: Simplifies the process of sending and streaming requests to external sites.
- azure: Provides access to Windows Azure Storage, including tables.
- url: Facilitates the parsing of URL strings.
- mongodb: Provides access to MongoDB databases.
- nconf: Simplifies the process of setting and retrieving application settings.
Also, a few variables are set in the first portion of the code, following this format:
var port = process.env.PORT || nconf.get("PORT_NUMBER");
This allows their values to be retrieved either from the Windows Azure Web Sites configuration parameters (once deployed to the cloud), or from the local config.json file (in case the parameter can’t be found in the Windows Azure environment). Finally, a client is created for the Windows Azure Table, a default agent for the subsequent requests is defined and a placeholder for the error-logging object is instantiated to null, as shown in the following initialization code:
// Create Windows Azure Storage Table
clientconsole.log('Connecting to Windows Azure Table Service');
var tableService = azure.createTableService(storageServiceName,
storageKey, storageServiceUrl);
// Create custom agent for requests; number of sockets can be tweaked
var agent = new http.Agent();agent.maxSockets = 1000;
// Placeholder for errorEntity objectvar errorEntity = null;
Once the initialization process has been performed, it’s time to create an HTTP server in Node.js, with its corresponding callback function. The basic waterfall structure in high-level pseudocode—derived from the asynchronous, event-based nature of Node.js—looks like this:
Create HTTP Server (request, response)
{callback to}
Find the page where the video is hosted, using the http-referer header value
Find the origin IP address by using the x-forwarded-for header value
Split the request URL to find the video-friendly name and encoding
{callback to}
Query the proxyreferrers table for validation
{callback to}
Query the MongoDB database to find the request country
{callback to}
Query the proxycountries table for validation
{callback to}
Stream the video using the request function
You’ll find details in the included source code, but we want to highlight a few aspects:
- The origin IP address is converted to an integer number using a formula defined by MaxMind (bit.ly/15xuuJE). This number is then used in the following query to find the corresponding country in the MongoDB database:
{ convstart: { $lte: ipValue }, convend: { $gte: ipValue } }
The ipValue is compared against the range defined between the convstart (less than or equal to) and the convend (greater than or equal to) columns provided in the MaxMind table. If no value is found, the country is set to “undefined.”
- The main streaming process occurs in the request command inside the function streamVideo, which looks like this:
request({options},{callback_function}).pipe(resp);
Thanks to the request module, which simplifies this process, data is sent back to the client as it’s received, being piped directly back to the client. This turns our reverse proxy into a fast, efficient application. The keep-alive header (found in the full code among the options) is extremely important, improving server performance by keeping the client/server connection open across multiple requests to the server.
Testing the Reverse Proxy Server Locally
You can test the application locally by downloading the Node.js installer from nodejs.org. Open a command prompt window in the folder where you copied the source files and execute the command “npm install,” which will install the required modules. Next, set the different options inside the config.json file, including the Windows Azure Table Storage information, the MongoDB connection string and the location of the videos containing the error messages. You can now start the solution in the same command prompt windows by typing “node server.js.”
Deploying the Reverse Proxy to Windows Azure Web Sites
In order to deploy the reverse proxy to Windows Azure, create a new Web site in the management portal and enable Git deployment, which lets you check in code from any Git repository (including local ones). You’ll find specific instructions on how to accomplish this at bit.ly/KCQo9V. The source code files already include package.json, which defines the required modules and engine version for the solution to work in Windows Azure. Once the Web site is created, you can deploy directly from the command prompt window by navigating to the folder where the files are located and executing the following commands:
> git init
> git add .
> git commit –m "Initial commit"
> git push {git URL from Windows Azure portal} master
After you provide the required credentials, the code will be deployed to Windows Azure.
We’ve created a Web site at rvvideo.azurewebsites.net where you can see a sample page that includes the popular HTML5-compatible JW Player, pointing to our reverse proxy (Figure 5). If you’re located in the United States, you’ll be able to watch the video; otherwise, you’ll get a warning about the video not being available in your country or region. There’s also a tab on the Web site pointing to a localhost server, in order to facilitate local debugging.
Figure 5 Sample Web Site Pointing to the Node.js Proxy Server
Wrapping Up
In this example, we’ve shown how to combine multiple cloud components to add an extra layer of geolocation security to videos stored in Windows Azure blob storage. Intercepting, routing and manipulating HTTP requests is one of the use cases where Node.js shines. We’ve also shown how simple it is to interact with Windows Azure services such as Table Storage, as well as third-party providers in the Windows Azure Store such as MongoLab. Finally, by deploying our solution to Windows Azure Web Sites, we can scale its capacity based on demand and traffic.
Bruno Terkaly is a developer evangelist for Microsoft. His depth of knowledge comes from years of experience in the field, writing code using a multitude of platforms, languages, frameworks, SDKs, libraries and APIs. He spends time writing code, blogging and giving live presentations on building cloud-based applications, specifically using the Windows Azure platform. You can read his blog at blogs.msdn.com/b/brunoterkaly.
Ricardo Villalobos is a seasoned software architect with more than 15 years of experience designing and creating applications for companies in the supply chain management industry. Holding different technical certifications, as well as a master’s degree in business administration from the University of Dallas, he works as a cloud architect in the Windows Azure CSV incubation group for Microsoft. You can read his blog at blog.ricardovillalobos.com.
Bruno and Ricardo jointly present at large industry conferences. They encourage readers of Windows Azure Insider to contact them for availability. Bruno can be reached at bterkaly@microsoft.com and Ricardo can be reached at Ricardo.Villalobos@microsoft.com.
Thanks to the following technical expert for reviewing this article: David Makogon (Microsoft)