Disposable, Finalizers, and HttpClient

Articolo
12/29/2017

In this essay I wanted to share some investigation that I've done on the topic of C# Disposal, Finalizers, and how it's related to the correct usage of HttpClient.

Abstract

We start with some general introduction of disposal and finalization in C#. I was interested, whether disposal is required, is disposal happening during garbage collection, and whether the unmanaged resources are released when the process is terminated, and by whom. Then we proceed to investigate the correct usages of HttpClient in regards to disposal.

Let's start one by one.

Finalizers

First, we are going to check if the finalizers are run when the garbage is collected, and when the process is terminated.

 
    class Program
    {
        static A a = new A() { a = "a" };

        static void Main(string[] args)
        {
            Foo();
            Console.WriteLine("Starting to collect");
            GC.Collect();
            Console.WriteLine("Finished collecting");
            Console.ReadLine();
            Console.WriteLine("Ending process");
        }

        static void Foo()
        {
            Console.WriteLine("Starting Foo");
            A b = new A() { a = "b" };
            Console.WriteLine("Finishing Foo");
        }
    }

    class A
    {
        public string a;

        ~A()
        {
            Console.WriteLine("Finalize " + a);
        }
    }

That's the output we get:

 
Starting Foo
Finishing Foo
Starting to collect
Finished collecting
Finalize b

Ending process
Finalize a

So, it looks like, if the process is terminating correctly, finalizers are run by the collector. Which is actually different from the Dispose method. Let's add Dispose method:

 
    class A : IDisposable
    {
        public string a;

        public void Dispose()
        {
            Console.WriteLine("Disposing " + a);
        }

        ~A()
        {
            Console.WriteLine("Finalize " + a);
        }
    }

As you can see, the output will be absolutely the same. So it means that unless you run Dispose manually, or run it from the finalizer, it won't be run by the garbage collector. So if you have any unmanaged resources that need to be taken care of, you definitely should pay attention.

Now let's see what happens if the process terminates incorrectly, by sending it the terminate signal, using, for example, ProcessHacker [6]:

 
Starting Foo
Finishing Foo
Starting to collect
Finished collecting
Finalize b

So you can see that the finalizer wasn't called. But we are not sure, what if it was called when the console wasn't available anymore. Let's modify the code:

 
        ~A()
        {
            Console.WriteLine("Finalize " + a);
            if (a == "a")
            {
                File.WriteAllText("bar", "bar");
            }
        }

Turns out, still no luck. And [7] supports this finding.

No "user mode" code in the process has a chance to run when a process is terminated.

Moreover, any finalizer code that is ever run is limited by time [12]:

"[When a process is gracefully terminating], each Finalize method is given approximately 2 seconds to return. If a Finalize method doesn't return within 2 seconds, the CLR just kills the process - no more Finalize methods are called. Also, if it takes more than 40 seconds to call all objects' Finalize methods, then again, the CLR just kills the process. Note: These timeout values were correct at the time I wrote this text, but Microsoft might change them in the future."
- Jeffrey Richter, Applied Microsoft .NET Framework Programming, pg 467; and CLR via C#, 2nd ed, pg 478

Disposal

Disposal is said to be required when there are some unmanaged resources that are needed to be released. Suppose, you are creating a file:

 
            var f = File.OpenWrite("foo.txt");
            Console.WriteLine(f.Handle);
            Console.ReadLine();

f.Handle will contain the OS handle for the open file. You can look it up using handle [5]:

 
FileCreate1.exe    pid: 6332   type: File            D4: foo.txt

Suppose we send terminate signal to the process, to make sure that even the finalizers won't run, like we found out above. Let's see if the handle will still be active.
If you run the code, kill the process, and look for the handle using

 
handle foo

you'll get no dice:

 
No matching handles found.

So the OS is indeed collecting the handles on the process termination. You can also find this out, because you will eventually get the same handle ID.

Also, we should take notice, that if you plan to derive from your class, and you run your Dispose method from both the finalizer, and manually, take care that you include some checks, and don't crash when you run the Dispose from the finalizer, after you had already run it from the main code manually.

So, we can summarize:

Finalizers are run on garbage collection
Dispose method is not run automatically, you need to run it manually and/or from the finalizer, if you need to dispose of unmanaged resources
Finalizers are not run on process termination
Unmanaged resources are collected by OS on process termination
Take care of double disposal, especially when using derived classes

The conclusion would be:

You need to care about manually calling Dispose and disposing unmanaged objects if you recreate your objects at runtime
You don't need to care about collecting unmanaged objects on process termination, as OS will take care of it
Take care of double disposal, especially when using derived classes

Now let's look at the main topic.

HttpClient

Standard by-the-book usage is supposed to be like this, given that it implements IDisposable, and hence you can use using to call Dispose automatically:

 
                using (var client = new HttpClient())
                {
                    var result = await client.GetAsync("https://example.com/");
                }

Suppose you're creating a bunch of requests:

 
            for (int i = 0; i < 10; i++)
            {
                using (var client = new HttpClient())
                {
                    var result = await client.GetAsync("https://example.com/");
                    Console.WriteLine(result.StatusCode);
                }
            }

The problem with this approach is that each client will create a socket that will survive even after the application closes. This is the output of netstat command after the process has terminated [1]:

 
  TCP    192.168.1.6:13996      93.184.216.34:http     TIME_WAIT
  TCP    192.168.1.6:13997      93.184.216.34:http     TIME_WAIT
  TCP    192.168.1.6:13998      93.184.216.34:http     TIME_WAIT
  TCP    192.168.1.6:13999      93.184.216.34:http     TIME_WAIT
  TCP    192.168.1.6:14000      93.184.216.34:http     TIME_WAIT
  TCP    192.168.1.6:14001      93.184.216.34:http     TIME_WAIT
  TCP    192.168.1.6:14002      93.184.216.34:http     TIME_WAIT
  TCP    192.168.1.6:14003      93.184.216.34:http     TIME_WAIT
  TCP    192.168.1.6:14004      93.184.216.34:http     TIME_WAIT
  TCP    192.168.1.6:14005      93.184.216.34:http     TIME_WAIT

So, as you can imagine, if you are creating a lot of requests, you're going to create a hell of a lot of sockets, even though you're disposing it supposedly properly.

According to [2], the reason for this behavior is not something to do with C# or disposing, but with the OS behavior in regards to sockets.

They are in the TIME_WAIT state which means that the connection has been closed on one side (ours) but we’re still waiting to see if any additional packets come in on it because they might have been delayed on the network somewhere.

Here's the diagram:

So, according to [3], or many other community resources, the general recommendation is that you should use one global instance of HttpClient, which is thread-safe, and don't dispose it at all. Thread safety is documented in [4]:

 
The following methods are thread safe:

CancelPendingRequests
DeleteAsync
GetAsync
GetByteArrayAsync
GetStreamAsync
GetStringAsync
PostAsync
PutAsync
SendAsync

Also stated in MSDN:

HttpClient is intended to be instantiated once and re-used throughout the life of an application. Instantiating an HttpClient class for every request will exhaust the number of sockets available under heavy loads.

According to our findings above, this should be fine, as if we don't recreate the object at runtime, the OS will collect any unmanaged objects upon project termination, so disposal is not necessary.

It's also worth noting, that, suppose if you're writing a command line tool, not an online service, that you should take notice at this behavior. If you just follow by-the-book practice, your tool can hog up the system up to no good. Take a look at what curl tool does, for example: it reuses sticky connections between the runs for this purpose.

I was interested to see the source of HttpClient to see what they actually do in the finalizer and/or Dispose method. Gladly, it's available on github [8]:

 
        private CancellationTokenSource _pendingRequestsCts;

        protected override void Dispose(bool disposing)
        {            
            if (disposing && !_disposed)
            {
                _disposed = true;

                // Cancel all pending requests (if any). Note that we don't call CancelPendingRequests() but cancel
                // the CTS directly. The reason is that CancelPendingRequests() would cancel the current CTS and create
                // a new CTS. We don't want a new CTS in this case.
                _pendingRequestsCts.Cancel();
                _pendingRequestsCts.Dispose();
            }

            base.Dispose(disposing);
        }

        public Task SendAsync(HttpRequestMessage request, HttpCompletionOption completionOption,
            CancellationToken cancellationToken)
        {
            if (request == null)
            {
                throw new ArgumentNullException(nameof(request));
            }
            CheckDisposed();
            CheckRequestMessage(request);

            SetOperationStarted();
            PrepareRequestMessage(request);
            // PrepareRequestMessage will resolve the request address against the base address.

            // We need a CancellationTokenSource to use with the request.  We always have the global
            // _pendingRequestsCts to use, plus we may have a token provided by the caller, and we may
            // have a timeout.  If we have a timeout or a caller-provided token, we need to create a new
            // CTS (we can't, for example, timeout the pending requests CTS, as that could cancel other
            // unrelated operations).  Otherwise, we can use the pending requests CTS directly.
            CancellationTokenSource cts;
            bool disposeCts;
            bool hasTimeout = _timeout != s_infiniteTimeout;
            if (hasTimeout || cancellationToken.CanBeCanceled)
            {
                disposeCts = true;
                cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken, _pendingRequestsCts.Token);
                if (hasTimeout)
                {
                    cts.CancelAfter(_timeout);
                }
            }
            else
            {
                disposeCts = false;
                cts = _pendingRequestsCts;
            }

            // Initiate the send
            Task sendTask = base.SendAsync(request, cts.Token);
            return completionOption == HttpCompletionOption.ResponseContentRead ?
                FinishSendAsyncBuffered(sendTask, request, cts, disposeCts) :
                FinishSendAsyncUnbuffered(sendTask, request, cts, disposeCts);
        }

Ok, not much going on here except sending cancellations to the pending requests. Let's look at the base class [9]:

 
        public void Dispose()
        {
            Dispose(true);
            GC.SuppressFinalize(this);
        }

        protected virtual void Dispose(bool disposing)
        {
            if (disposing && !_disposed)
            {
                _disposed = true;

                if (_disposeHandler)
                {
                    _handler.Dispose();
                }
            }
        }

Nothing except calling HttpMessageHandler's Dispose. Also, take a notice, none of the classes include a finalizer. So, according to our investigation above, they won't be disposed by the garbage collector, if you don't run Dispose manually from your own code. I wonder why, actually?

Now let's look at HttpMessageHandler [10]: it's an abstract class. But, by default, HttpClient uses HttpClientHandler as a handler. Its source is not available on github except for Mono implementation, but we can assume it should be fairly similar [13].

The Mono implementation shows some of the HttpClientHandler:

 
     static long groupCounter;

       public HttpClientHandler ()
     {
           allowAutoRedirect = true;
           maxAutomaticRedirections = 50;
          maxRequestContentBufferSize = int.MaxValue;
         useCookies = true;
          useProxy = true;
            connectionGroupName = "HttpClientHandler" + Interlocked.Increment (ref groupCounter);
       }

       protected override void Dispose (bool disposing)
        {
           if (disposing && !disposed) {
               Volatile.Write (ref disposed, true);
                ServicePointManager.CloseConnectionGroup (connectionGroupName);
         }

           base.Dispose (disposing);
       }

       internal virtual HttpWebRequest CreateWebRequest (HttpRequestMessage request)
       {
           var wr = new HttpWebRequest (request.RequestUri);
           wr.ThrowOnError = false;
            wr.AllowWriteStreamBuffering = false;

           wr.ConnectionGroupName = connectionGroupName;

...............................................
        }

So, we can see, that the recommended way to use HttpClient is to avoid recreating it, but rather use one single global instance, and the connections will be sticky, and kept in ServicePointManager. There's one problem, however, with global HttpClient, and it's that DNS changes are not honored [11], because it, duh, keeps a sticky connection, obviously, for perf reasons.

The classic and currently industry standard solution is to set the timeout:

 
var sp = ServicePointManager.FindServicePoint(new Uri("https://example.com"));
sp.ConnectionLeaseTimeout = 60*1000;

So, we can summarize:

HttpClient is not supposed to be recreated at runtime, but rather one single global object is to be constructed and used
HttpClient is mostly thread safe
Creating lots of HttpClient objects will create lots of sockets, which is rather expensive
Disposing global HttpClient is not needed
To solve sticky connection issue, use a timeout

Now we know a lot of new stuff, let's go code some new services!

References

Condividi tramite

Disposable, Finalizers, and HttpClient

Risorse aggiuntive