November 2015

Volume 30 Number 12

ASP.NET - Use ASP.NET as a High-Performance File Downloader

By Doug Duerner | November 2015

Slow and faulty connections have long been the bane of large file downloads. You can be in an airport concourse gathering media over a sketchy WiFi connection to work on a presentation during a long flight, or on the African savannah trying to download a large installation file via satellite link for a solar-powered water pump. In either instance, the cost of having a large file download crash is the same: time lost, productivity drained and the success of the assignment imperiled.

It doesn’t have to be that way. In this article we show how to create a utility to address the problem of resuming and continuing failed downloads that can be caused by poor connections that are prone to dropping offline during large file transfers.

Background

We wanted to create a simple file downloader utility that could easily be added to your existing IIS Web Server, with an extremely simple and easy-to-use client program (or the option of merely using the Web browser as a client).

The IIS Web Server has already proven to be a highly scalable, enterprise-grade Web server, serving up files to browsers for years. We basically wanted to take advantage of the IIS Web Server’s ability to handle many HTTP Web requests at the same time, in parallel, and apply that to file downloading (copying).

Essentially, we needed a file downloader utility that could download huge files to users around the world who were sometimes located in remote regions with slow and often faulty network links. With the possibility of some remote users around the world still using modem links or faulty satellite links that might be going offline at random times or intermittently toggling between online and offline, the utility would need to be extremely resilient with the ability to retry only the portions of the file that failed to download. We didn’t want a user to spend all night downloading a huge file over a slow link, and if there was one small hiccup in the network link, need to start the entire download process over again. We also needed to ensure these huge files being downloaded weren’t buffered in server memory and that the server memory usage was minimal, so memory usage wouldn’t keep rising until server failure when many users were downloading files at the same time.

Conversely, if the user was lucky enough to have a reliable high-speed network link—with both the client and server machines being high-end computers equipped with multiple CPUs and network cards—we wanted the user to be able to download a file using multiple threads and multiple connections, allowing the download of multiple chunks of the file at the same time in parallel using all hardware resources, while at the same time using minimal server memory.

In a nutshell, we created a simple, multithreaded, parallel, low-memory-usage file download utility that can divide the file into chunks, download the individual chunks on separate threads and allow a user to retry only the chunks that failed to download.

The sample project that accompanies this article contains the code for the file download utility and provides a rudimentary base infrastructure that can be expanded going forward, allowing you to get more sophisticated as need arises.

Sample Project Overview

In essence, DownloadHandler.dll transforms an existing IIS Web Server into a multithreaded file downloader that lets you download a file in chunks, in parallel, using a simple URL from the standalone executable client (FileDownloader.exe), as shown in Figure 1. Note that the parameter (chunksize=5242880) is optional, and if not included, will default to download the entire file in one chunk. Figure 2 and Figure 3 demonstrate how it allows you to repeatedly retry only the failed portions of the file until they succeed, without having to completely restart the entire download from the beginning, like most other file-downloading software.

High-Level Design Overview of Processing Flow for DownloadHandler.dll
Figure 1 High-Level Design Overview of Processing Flow for DownloadHandler.dll (Using FileDownloader.exe as Client)

Standalone Executable as Download Client
Figure 2 Standalone Executable as Download Client (with Failed Chunks)

Standalone Executable as Download Client
Figure 3 Standalone Executable as Download Client (After Retry)

Figure 1 is a high-level overview of the design of DownloadHandler.dll and FileDownloader.exe, showing the processing flow as the chunks of the file on the server machine’s hard drive pass through DownloadHandler.dll and FileDownloader.exe into the file on the client machine’s hard drive, illustrating the HTTP protocol headers involved in that process.

In Figure 1, FileDownloader.exe initiates a file download by calling the server using a simple URL, which contains the name of the file you want to download as a URL query string parameter (file=file.txt), and internally uses the HTTP method (HEAD), so that initially the server will send back only its response headers, one of which contains the total file size. The client then uses a Parallel.ForEach construct to iterate, splitting the total file size into chunks (byte ranges) based on the chunk size in the parameter (chunksize=5242880). For each individual iteration, the Parallel.ForEach construct executes a processing method on a separate thread, passing in the associated byte range. Inside the processing method, the client issues an HttpWebRequest call to the server using the same URL and internally appends an HTTP request header containing the byte range supplied to that processing method (that is, Range: bytes=0-5242880, Range: bytes=5242880-10485760 and so on).

On the server machine, our implementation of the IHttpAsync­Handler interface (System.Web.IHttpAsyncHandler) handles each request on a separate thread, executing the HttpResponse.Transmit­File method in order to write the byte range requested from the server machine’s file directly to the network stream—with no explicit buffering—so the memory impact on the server is almost non-­existent. The server sends back its response with an HTTP Status Code 206 (PartialContent) and internally appends the HTTP response header identifying the byte range being returned (that is, Content-Range: bytes 0-5242880/26214400, Content-Range: bytes 5242880-10485760/26214400 and so on). As each thread receives the HTTP response on the client machine, it writes the bytes returned in the response to the corresponding portion of the file on the client machine’s hard drive that was identified in the HTTP response header (Content-Range). It uses asynchronous overlapped file I/O (to ensure the Windows I/O Manager doesn’t serialize the I/O requests before dispatching the I/O Request Packets to the kernel-mode driver to complete the file write operation). If multiple user-mode threads all do a file write and you don’t have the file opened for asynchronous overlapped I/O, the requests will be serialized and the kernel-mode driver will only receive one request at a time. For more information on asynchronous overlapped I/O, see “Getting Your Driver to Handle More Than One I/O Request at a Time” (bit.ly/1NIaqxP) and “Supporting Asynchronous I/O” (bit.ly/1NIaKMW) on the Hardware Dev Center site.

To implement the asynchronicity in our IHttpAsyncHandler, we manually post an overlapped I/O structure to the I/O completion port, and the CLR ThreadPool runs the completion delegate supplied in the overlapped structure on a completion port thread. These are the same completion port threads used by most of the built-in async methods. Generally, it’s best to use the new built-in async methods for most I/O-bound work, but in this case we wanted to use the HttpResponse.TransmitFile function due to its outstanding ability to transfer huge files without explicitly buffering them in server memory. It’s amazing!

Parallel.ForEach is primarily for CPU-bound work and should never really be used in a server implementation due to its blocking nature. We offload the work to a completion port thread from the CLR ThreadPool, instead of a regular worker thread from the CLR ThreadPool, in order to keep from depleting the same threads used by IIS to service incoming requests. Also, the more efficient manner in which the completion port processes work somewhat limits the thread consumption on the server. There’s a diagram with a more detailed explanation listed in the sample project code in the comment section at the top of the IOThread class that highlights the differences between the completion port threads and worker threads in the CLR ThreadPool. Because scaling to millions of users isn’t the primary goal of this utility, we can afford to expend the additional server threads required to run the HttpResponse.TransmitFile function in order to achieve the associated memory savings on the server when transferring massive files. Essentially, we’re trading the loss of scalability caused by using additional threads on the server (instead of the built-in async methods with no threads), in order to use the HttpResponse.TransmitFile function, which consumes extraordinarily minimal server memory. Although it’s outside the scope of this article, you could optionally use the built-in async methods in combination with unbuffered file I/O to achieve a similar memory savings with no additional threads, but from what we understand, everything must be sector aligned and it’s somewhat difficult to properly implement. On top of that, it appears Microsoft has purposely removed the NoBuffering item from the FileOptions enum in order to actually prevent unbuffered file I/O, requiring a manual hack to even make it possible. We were quite nervous of the risks associated with not properly implementing it and decided to go with the less risky option of HttpResponse.TransmitFile, which has been fully tested.

FileDownloader.exe can launch multiple threads, each issuing a separate HttpWebRequest call corresponding to a separate portion (byte range) of the file being downloaded based on the total size of the file divided into the “Chunk Bytes” specified, as shown in Figure 2.

Any thread that fails to download the portion of the file (byte range) specified in its HttpWebRequest call can be retried by merely making the same HttpWebRequest call (for only that failed byte range) repeatedly until it eventually succeeds, as shown in Figure 3. You won’t lose the portions of the file already downloaded, which in the case of a slow connection can mean many hours of downloading time saved. You can virtually eliminate the negative impact of a faulty connection that’s continually going offline. And with the design’s multiple threads downloading different portions of the file at the same time in parallel—directly to the network stream with no explicit buffering—and onto the hard drive with asynchronous overlapped file I/O, you can maximize the amount of downloading accomplished during the window of time when a flaky connection is actually online. The tool will continue to finish the remaining portions each time the network link comes back online, without losing any work. We like to think of it as more of a “retryable” file downloader, not a “resumable” file downloader.

The difference can be illustrated in a hypothetical example. You’re going to download a large file that will take all night. You start a resumable file downloader when you leave work and let it run. When you arrive at work in the morning, you see the file download failed at 10 percent and is ready to be resumed. But when it resumes, it will still need to run overnight again in order to finish the remaining 90 percent.

In contrast, you start our retryable file downloader when you leave work and let it run all night. When you arrive at work in the morning, you see the file download failed at one chunk at 10 percent, but continued to download the rest of the chunks of the file. Now, you only have to retry just that one chunk and you’re done. After encountering that failed chunk, from a momentary hiccup in the network link, it went ahead and finished the remaining 90 percent over the rest of the night when the network link came back online.

The default download client built into the Web browser can also be used as a download client, using a URL such as https://localhost/DownloadPortal/Download?file=test.txt&chunksize=5242880.

Note that the parameter (chunksize=5242880) is also optional when using the Web browser as a download client. If not included, the server will download the entire file in one chunk using the same HttpResponse.TransmitFile. If included, it will execute a separate HttpResponse.TransmitFile call for each chunk.

Figure 4 is a high-level overview of the design of DownloadHandler.dll when using a Web browser that doesn’t support partial content as a download client. It illustrates the processing flow as the chunks of the file on the server machine’s hard drive pass through DownloadHandler.dll and the Web browser into the file on the Web browser machine’s hard drive.

High-Level Design Overview of Processing Flow for DownloadHandler.dll
Figure 4 High-Level Design Overview of Processing Flow for DownloadHandler.dll (Using Web Browser That Doesn’t Support Partial Content as Client)

A cool feature of our implementation of the IHttpAsyncHandler interface on the IIS Web Server is support of “byte serving” by sending the Accept-Ranges HTTP header in its HTTP response (Accept-Ranges: bytes), telling clients it will serve up portions of a file (partial content range). If the default download client inside the Web browser supports partial content, it can send the server the Range HTTP header in its HTTP request (Range: bytes=5242880-10485760), and when the server sends the partial content back to the client, it will send back the Content-Range HTTP header inside its HTTP response (Content-Range: bytes 5242880-10485760/26214400). So, depending on what Web browser you’re using and the default download client built into that browser, you might get some of the same benefits as our standalone executable client. Regardless, most Web browsers will let you build your own custom download client that can be plugged into the browser, replacing the built-in default.

Sample Project Configuration

For the sample project, simply copy DownloadHandler.dll and IOThreads.dll into the \bin directory under the virtual directory and put an entry in the handlers section and modules section of the web.config, like so:

<handlers>
  <add name="Download" verb="*" path="Download"
    type="DownloaderHandlers.DownloadHandler" />
</handlers>
<modules>
  <add name="CustomBasicAuthenticationModule" preCondition="managedHandler"
    type="DownloaderHandlers.CustomBasicAuthenticationModule" />
</modules>

If there’s no virtual directory on the IIS Server, create one with a \bin directory, make it an Application and make sure it’s using a Microsoft .NET Framework 4 Application Pool.

The custom basic authentication module uses the same, easy-to-use, AspNetSqlMembershipProvider used on many ASP.NET Web sites today, storing the username and password required to download a file inside the aspnetdb database on the SQL Server. One of the handy benefits of AspNetSqlMembershipProvider is the user needn’t have an account on the Windows domain. Detailed instructions on how to install AspNetSqlMembershipProvider and the settings required on the IIS Server to configure the user accounts and SSL certificate are listed in the sample project code in the comment section at the top of the CustomBasicAuthentication­Module class. The other advanced configuration options used for tuning the IIS Server have usually already been set by the IT department that manages the server and are beyond the scope of this article, but if that’s not the case, they’re readily available in the TechNet Library at bit.ly/1JRJjNS.

You’re done. It’s as easy as that.

Compelling Factors

The foremost compelling factor of the design is not that it’s faster, but that it’s more resilient and fault tolerant to network outages caused by flaky, unstable network links continually going online and offline. Typically, downloading one file, in one chunk, on one connection, will yield the maximum throughput.

There are some unique exceptions to this rule, such as a mirrored server environment where a file is downloaded in separate pieces, getting each piece of the file from a different mirror server, as shown in Figure 5. But generally, downloading a file on multiple threads is actually slower than downloading a file on one thread, because the network is typically the bottleneck. However, being able to retry solely the failed portions of the file download repeatedly until they succeed, without having to restart the entire download process, molds what we like to think of as a sort of quasi-fault tolerance.

Hypothetical Future Enhancements to Simulate an Extremely Rudimentary Mirror Infrastructure
Figure 5 Hypothetical Future Enhancements to Simulate an Extremely Rudimentary Mirror Infrastructure

Also, if someone were to modify the design as a future enhancement to simulate an extremely rudimentary mirror server infrastructure, as shown in Figure 5, it might shape what could be thought of as a sort of quasi-redundancy.

Essentially, the design lets you reliably download a file over an unreliable network. A brief hiccup in your network link doesn’t mean you have to start over from the beginning; instead, you can simply retry only the pieces of the file that failed. A nice addition to the design (that would make it even more resilient) would be to store the current progress state of the download to a file on the hard drive as the download is progressing, so you could essentially retry a failed download even across client application and client machine restarts. But that will be an exercise left to the reader.

Another compelling factor, which rivals the aforementioned in prominence, lies in the use of HttpResponse.TransmitFile on the server to write the bytes of the file directly to the network stream—with no explicit buffering—in order to minimalize the impact on server memory. It’s surprising how negligible the impact is on server memory, even when downloading extremely large files.

There are three additional factors that are far less significant, but compelling nonetheless.

First, because the design includes both the front-end client and the back-end server, you have complete control over the server-­side configuration. This gives you the freedom and power to adjust configuration settings that can often greatly impede the file downloading process on servers owned by someone else and out of your control. For example, you can adjust the connection limit restriction imposed per client IP address to a value greater than the usual limit of two connections. You can also adjust the throttling limit per client connection to a greater value.

Second, the sample project code inside our front-end client (FileDownloader.exe) and our back-end server (DownloadHandler.dll) can serve as simple, clear blocks of sample code demonstrating the use of the HTTP request and response headers necessary to facilitate partial content byte ranges in the HTTP protocol. It’s easy to see what HTTP request headers the client must send in order to request the byte ranges, and what HTTP response headers the server must send to return the byte ranges as partial content. It should be relatively easy to modify the code to implement higher-level functionality on top of this simple base functionality, or implement some of the more advanced functionality available in more sophisticated software packages. Also, you can use it as a simple starting template that makes it relatively easy to add support for some of the other more advanced HTTP headers, such as Content-Type: multipart/byteranges, Content-MD5: md5-digest, If-Match: entity-tag and so on.

Third, because the design uses the IIS Web Server, you automatically benefit from some of the built-in functionality provided by the server. For example, the communication can automatically be encrypted (using HTTPS with an SSL certificate) and compressed (using gzip compression). However, it might not be advisable to run gzip compression on extremely large files if doing so results in too much stress on your server CPUs. But, in the event your server CPUs can shoulder the additional load, the efficiency of transferring much smaller compressed data can sometimes make a big difference in the overall throughput of the entire system.

Future Improvements

The sample project code only provides the minimum core functionality required for the file downloader to operate. Our goal was to keep the design simple and easy to understand so it could be used relatively effortlessly as a base upon which to add enhancements and additional functionality. It’s merely a starting point and base template. Many additional enhancements would be absolutely essential before it could even begin to be used in a production environment. Adding a higher-level abstraction layer that provides this additional, more-advanced functionality is left as an exercise for the reader. However, we’ll expound on several of the more crucial enhancements.

The sample project code doesn’t currently include an MD5 hash checksum on the file. In the real world, it’s essential to employ some sort of file checksum strategy to ensure the file downloaded to the client matches the file on the server, and that it hasn’t been tampered with or altered in any way. The HTTP headers make this easy to do with the header (Content-MD5: md5-digest). In fact, one of our first prototypes included performing an MD5 hash checksum on the file each time the file was requested and placing the digest into the header (Content-MD5: md5-digest) before the file left the server. The client would then perform the same MD5 hash checksum on the file it received and would verify the resulting digest matched the digest in the header (Content-MD5: md5-digest) returned by the server. If it didn’t match, the file had been tampered with or corrupted. Although this accomplishes the goal of ensuring the file isn’t changed, the large files caused intense CPU pressure on the server and took far too long to perform.

In reality, it will probably require some sort of cache layer that does the MD5 hash checksum processing on the file (in the background) one time for the life of the file, and stores the resulting digest in the dictionary with the file name as the key. Then a simple dictionary lookup is all that’s required on the server to obtain the digest for the file, and the digest can be added to the header as the file leaves the server, in a flash, with minimal impact on server CPUs.

The sample project code also doesn’t currently restrict a client from using a gigantic number of threads and splitting the file into an enormous number of chunks. It basically allows a client to “do what it needs to do” to ensure it can download a file. In the real world, there would probably need to be some sort of infrastructure that could impose a limit on the client, so one client wouldn’t be able to hijack the server and starve all the other clients.

Figure 5 illustrates a hypothetical future enhancement to simulate an extremely rudimentary mirror infrastructure by modifying the design to supplying a list of “node name/byte range” pairs as the URL query string parameter, instead of the current design’s “chunksize” parameter. The current design could be modified relatively easily to get each chunk of the file from a different server by merely iterating the “node name/byte range” pairs, launching an HttpWebRequest for each pair, instead of internally iterating to split the total file size into chunks based on the “chunksize” parameter and launching an HttpWebRequest for each chunk.

You could construct the URL for the HttpWebRequest by merely replacing the server name with the associated node name from the list of “node name/byte range” pairs, adding the associated byte range to the Range HTTP header (that is, Range: bytes=0-5242880), and then removing the “node name/byte range” list from the URL entirely. Some sort of metadata file could identify on which servers the pieces of a file are located, and the requesting machine could then assemble the one file from pieces of the file that are spread across different servers.

If a file is mirrored on 10 servers, the design could be modified to get piece 1 of the file from the server 1 mirror copy, piece 2 of the file from the server 2 mirror copy, piece 3 of the file from the server 3 mirror copy and so on. Again, it would be essential to do an MD5 hash checksum on the file after you retrieved all the pieces of the file and reassembled the full file on the client, in order to make sure no chunks were corrupted on any of the mirror servers and that you did in fact receive the entire file. You could even get a little fancier and take it to the next level by having the servers geographically distributed across the country, building some elaborate intelligence into the code that would determine which servers are under the least processing load, then using those servers to service the request returning the chunks of the file.

Wrapping Up

The goal of our design wasn’t to create a faster, more scalable file downloader, but to create one that’s extremely resilient to momentary network outages.

We took great effort to make sure the design was extremely simple and clearly demonstrated how to use the HTTP protocol headers for “byte serving” byte ranges and partial content.

In our research, we actually found it quite difficult to find a good, clear example of how to do simple HTTP byte serving, and how to properly use the byte range headers in the HTTP protocol. Most examples were either unnecessarily complex or used many of the other headers to implement much more advanced features in the HTTP protocol, making it difficult to understand, let alone try to enhance or expand on going forward.

We wanted to provide you a simple solid base that includes only the minimum necessary, so it would be relatively easy to experiment and incrementally add more advanced functionality over time—or even go so far as to implement an entire higher-level abstraction layer that adds some of the more advanced features of the HTTP protocol.

We simply wanted to provide a straightforward example to learn from and build on going forward. Enjoy!


Doug Duerner is a senior software engineer with more than 15 years designing and implementing large-scale systems with Microsoft technologies. He has worked for several Fortune 500 banking institutions and for a commercial software company that designed and built the large-scale distributed network management system used by the Department of Defense’s Defense Information Systems Agency (DISA) for its “Global Information Grid” and the Department of State. He is a geek at heart, focusing on all aspects, but enjoys the most complex and challenging technical hurdles, especially those that everyone says “can’t be done.” Duerner can be reached at coding.innovation@gmail.com.

Yeon-Chang Wang is a senior software engineer with more than 15 years designing and implementing large-scale systems with Microsoft technologies. He, too, has worked for a Fortune 500 banking institution and for a commercial software company that designed and built the large-scale distributed network management system used by the Department of Defense’s Defense Information Systems Agency (DISA) for its “Global Information Grid” and the Department of State. He also designed and implemented a large-scale Driver Certification System for one of the world’s largest chip manufacturers. Wang has a master’s degree in Computer Science. He eats complex problems for dinner and can be reached at yeon_wang@yahoo.com.

Thanks to the following Microsoft technical experts for reviewing this article: Stephen Cleary and James McCaffrey