WCF: Unable to handle load (SSL and client cert authentication) - MaxPendingAccept limitation

Issue Definition:

WCF unable to handle load from multiple client when running with (Transport security with client credential as client certificate).

 

Symptoms:

1. From network traces, we see that server takes too much time to send a certificate request... and eventually the client gives up.

2. Further when the issue happens even the connected clients start observing their request taking lot of time to get processed.

 

 From WCF traces, when the issue was reported

After receiving the Connection information we see the WCF stack waiting for some time to follow the below stack, before error out.

 

at System.ServiceModel.Diagnostics.TraceUtility.TraceEvent(TraceEventType severity, Int32 traceCode, String traceDescription, TraceRecord extendedData, Object source, Exception exception)

at System.ServiceModel.Channels.HttpsChannelListener`1.ValidateAuthentication(HttpListenerContext listenerContext) <------------------

at System.ServiceModel.Channels.HttpRequestContext.ListenerHttpContext.ValidateAuthentication()

at System.ServiceModel.Channels.HttpRequestContext.ProcessAuthentication()

at System.ServiceModel.Channels.HttpChannelListener`1.HttpContextReceivedAsyncResult`1.Authenticate()

at System.ServiceModel.Channels.HttpChannelListener`1.HttpContextReceivedAsyncResult`1.ProcessHttpContextAsync()

at System.ServiceModel.Channels.HttpChannelListener`1.BeginHttpContextReceived(HttpRequestContext context, Action acceptorCallback, AsyncCallback callback, Object state)

at System.ServiceModel.Channels.SharedHttpTransportManager.EnqueueContext(IAsyncResult listenerContextResult)

at System.ServiceModel.Channels.SharedHttpTransportManager.OnGetContextCore(IAsyncResult listenerContextResult)

at System.ServiceModel.Channels.SharedHttpTransportManager.OnGetContext(IAsyncResult result)

 

 

So question is what is this method waiting on and why that internal method does not finish in time?

HttpsChannelListener`1.ValidateAuthentication(HttpListenerContext listenerContext)

 

Source code:

public override HttpStatusCode ValidateAuthentication(HttpListenerContext listenerContext)

        {

            HttpStatusCode result = base.ValidateAuthentication(listenerContext);

            if (result == HttpStatusCode.OK)

            {

                if (this.shouldValidateClientCertificate)

                {

                    HttpListenerRequest request = listenerContext.Request;

                    X509Certificate2 certificateEx = request.GetClientCertificate();

                    if (certificateEx == null) <------------- We end up here and throw the (msdn.microsoft.com/en-US/library/System.ServiceModel.Channels.HttpsClientCertificateNotPresent.aspx)

                    {

                        if (this.RequireClientCertificate)

                        {

                            if (DiagnosticUtility.ShouldTraceWarning) <-------------

                            {

                                TraceUtility.TraceEvent(TraceEventType.Warning, TraceCode.HttpsClientCertificateNotPresent,

                                    SR.GetString(SR.TraceCodeHttpsClientCertificateNotPresent),

                                    new HttpListenerRequestTraceRecord(listenerContext.Request), this, null);

                            }

                            result = CertificateErrorStatusCode;

                        }

                    }

                 =======

                   Removed the noise

                 =======

            return result;

        }

 

 That brings us to this line of code:

 X509Certificate2 certificateEx = request.GetClientCertificate();

What does this "GetClientCertificate()" method does.... ?

 

This class is inside the System.Net level

 

 public X509Certificate2 GetClientCertificate() {

            if(Logging.On)Logging.Enter(Logging.HttpListener, this, "GetClientCertificate", "");

            try {

                ProcessClientCertificate();

                GlobalLog.Print("HttpListenerRequest#" + ValidationHelper.HashString(this) + "::GetClientCertificate() returning m_ClientCertificate:" + ValidationHelper.ToString(m_ClientCertificate));

            } finally {

                if(Logging.On)Logging.Exit(Logging.HttpListener, this, "GetClientCertificate", ValidationHelper.ToString(m_ClientCertificate));

            }

            return m_ClientCertificate;

        }

 

 

So now we are trying to call the ProcessClientCertificate() and waiting for this guy to finish.....

As we can see here... System.Net traces gets log appended from this location only.....

where we observe that at a time only 10 GetClientCertRequest are being called from WCF level.

 

 

From System.Net traces I can see following packets:

DateTime=2015-02-15T05:57:06.5407000Z

System.Net.HttpListener Verbose: 0 : [3796] HttpListenerRequest#65752145::GetClientCertificate()

 

Where the GetClientCertificate is started at "05:57:06" and finally we failed at 5:59, so eventually GetClientCert took 2mins... but sometime it may fail quickly.. as well..

 

DateTime=2015-02-15T05:59:06.3487000Z

System.Net.HttpListener Verbose: 0 : [3796] Exiting HttpListenerRequest#65752145::GetClientCertificate() -> (null)

 

But interesting for sure, as soon as the System.Net level "GetClientCert()" method fails.... WCF will throw the exception and drop the request....

 

 

Assessment

==================

  1. Request which failed at WCF application level, indeed failed at System.Net level when trying to invoke the GetClientCert() and as per the System.Net traces looks like client never send the cert..
  2. Role of MaxPendingAccept at WCF level calling into the System.Net “GetClienCert()”.

 

  

MaxPendingAccepts (in Framework 4.5)

Gets or sets the maximum number of channels a service can have waiting on a listener for processing incoming connections to the service.

The maximum number of channels a service can have waiting on a listener. The default value is 2 * number of processors.

This property limits the number of channels that the server can have waiting on a listener. When MaxPendingAccepts is too low, there will be a small interval of time in which all of the waiting channels have started servicing connections, but no new channels have begun listening. A connection can arrive during this interval and will fail because nothing is waiting for it on the server. This property can be configured by setting the MaxPendingConnections property to a larger number.

https://msdn.microsoft.com/en-us/library/system.servicemodel.channels.connectionorientedtransportbindingelement.maxpendingaccepts(v=vs.110).aspx

 

 

Solution

=================

Interestingly when we moved the app to 4.5 and bumped this value, we see significant improvement in the load handling…

Source code from Framework 4.5, which helps in handling the large load:

referencesource.microsoft.com/#System.ServiceModel/System/ServiceModel/Channels/SharedHttpTransportManager.cs,eae189be1a91debe

 void StartListening()

        {

            for (int i = 0; i < maxPendingAccepts; i++)

            {

                IAsyncResult result = this.BeginGetContext(true);

                if (result.CompletedSynchronously)

                {

                    if (onCompleteGetContextLater == null)

                    {

                        onCompleteGetContextLater = new Action<object>(OnCompleteGetContextLater);

                    }

                    ActionItem.Schedule(onCompleteGetContextLater, result);

                }

            }

        }

 

Till 4.0 this value is hard coded to 10 and thus we see max request getting executed from System.Net level is only 10.

 

private void StartListening()

{

                for (int i = 0; i < 10; i++)

                {

                                IAsyncResult asyncResult = this.BeginGetContext(true);

                                if (asyncResult.CompletedSynchronously)

                                {

                                                if (this.onCompleteGetContextLater == null)

                                                {

                                                                this.onCompleteGetContextLater = new Action<object>(this.OnCompleteGetContextLater);

                                                }

                                                ActionItem.Schedule(this.onCompleteGetContextLater, asyncResult);

                                }

                }

}

 

 

 

So ideally we don’t have any solution to get rid of above problem until we move to framework 4.5

Another way is to write custom channel and handle the start event.

 

Hope this help !