TLS Handshake errors and connection timeouts? Maybe it’s the CTL engine….
Marius and Tolu from the Directory Services Escalation Team.
Today, we’re going to talk about a little twist on some scenarios you may have come across at some point, where TLS connections fail or timeout for a variety of reasons.
You’re probably already familiar with some of the usual suspects like cipher suite mismatches, certificate validation errors and TLS version incompatibility, to name a few.
Here are just some examples for illustration (but there is a wealth of information out there)
- Troubleshooting TLS 1.2 and Certificate Issue with Microsoft Message Analyzer: A Real World Example
- TLS 1.2 handshake failure
- Troubleshooting SSL related issues (Server Certificate)
Recently we’ve seen a number of cases with a variety of symptoms affecting different customers which all turned out to have a common root cause.
We’ve managed to narrow it down to an unlikely source; a built-in OS feature working in its default configuration.
We’re talking about the automatic root update and automatic disallowed roots update mechanisms based on CTLs.
Starting with Windows Vista, root certificates are updated on Windows automatically.
When a user on a Windows client visits a secure Web site (by using HTTPS/TLS), reads a secure email (S/MIME), or downloads an ActiveX control that is signed (code signing) and encounters a certificate which chains to a root certificate not present in the root store, Windows will automatically check the appropriate Microsoft Update location for the root certificate.
If it finds it, it downloads it to the system. To the user, the experience is seamless; they don’t see any security dialog boxes or warnings and the download occurs automatically, behind the scenes.
Additional information in:
How Root Certificate Distribution Works
During TLS handshakes, any certificate chains involved in the connection will need to be validated, and, from Windows Vista/2008 onwards, the automatic disallowed root update mechanism is also invoked to verify if there are any changes to the untrusted CTL (Certificate Trust List).
A certificate trust list (CTL) is a predefined list of items that are authenticated and signed by a trusted entity.
The mechanism is described in more detail in the following article:
An automatic updater of untrusted certificates is available for Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2
It expands on the automatic root update mechanism technology (for trusted root certificates) mentioned earlier to let certificates that are compromised or are untrusted in some way be specifically flagged as untrusted.
Customers therefore benefit from periodic automatic updates to both trusted and untrusted CTLs.
So, after the preamble, what scenarios are we talking about today?
Here are some examples of issues we’ve come across recently.
- Your users may experience browser errors after several seconds when trying to browse to secure (https) websites behind a load balancer.
- They might receive an error like "The page cannot be displayed. Turn on TLS 1.0, TLS 1.1, and TLS 1.2 in the Advanced settings and try connecting to https://contoso.com again. If this error persists, contact your site administrator."
- If they try to connect to the website via the IP address of the server hosting the site, the https connection works after showing a certificate name mismatch error.
- All TLS versions ARE enabled when checking in the browser settings:
[caption id="attachment_17615" align="alignnone" width="396"] Internet Options[/caption]
- You have a 3rd party appliance making TLS connections to a Domain Controller via LDAPs (Secure LDAP over SSL) which may experience delays of up to 15 seconds during the TLS handshake
- The issue occurs randomly when connecting to any eligible DC in the environment targeted for authentication.
- There are no intervening devices that filter or modify traffic between the appliance and the DCs
- A very similar scenario* to the above is in fact described in the following article by our esteemed colleague, Herbert:
Understanding ATQ performance counters, yet another twist in the world of TLAs
Where he details:
DC supports LDAP over SSL/TLS
A user sends a certificate on a session. The server need to check for certificate revocation which may take some time.*
This becomes problematic if network communication is restricted and the DC cannot reach the Certificate Distribution Point (CDP) for a certificate.
To determine if your clients are using secure LDAP (LDAPs), check the counter “LDAP New SSL Connections/sec”.
If there are a significant number of sessions, you might want to look at CAPI-Logging.
- A 3rd party meeting server performing LDAPs queries against a Domain Controller may fail the TLS handshake on the first attempt after surpassing a pre-configured timeout (e.g 5 seconds) on the application side
- Subsequent connection attempts are successful
So, what’s the story? Are these issues related in anyway?
Well, as it turns out, they do have something in common.
As we mentioned earlier, certificate chain validation occurs during TLS handshakes.
Again, there is plenty of documentation on this subject, such as
- TLS - SSL (Schannel SSP) Overview
- Schannel Security Support Provider Technical Reference
- How TLS/SSL Works: Logon and Authentication
- Client Certificate Authentication (Part 1)
During certificate validation operations, the CTL engine gets periodically invoked to verify if there are any changes to the untrusted CTLs.
In the example scenarios we described earlier, if the default public URLs for the CTLs are unreachable, and there is no alternative internal CTL distribution point configured (more on this in a minute), the TLS handshake will be delayed until the WinHttp call to access the default CTL URL times out.
By default, this timeout is usually around 15 seconds, which can cause problems when load balancers or 3rd party applications are involved and have their own (more aggressive) timeouts configured.
If we enable CAPI2 Diagnostic logging, we should be able to see evidence of when and why the timeouts are occurring.
We will see events like the following:
Event ID 20 – Retrieve Third-Party Root Certificate from Network :
- Trusted CTL attempt
[caption id="attachment_17625" align="alignnone" width="548"] Trusted CTL Attempt[/caption]
- Disallowed CTL attempt
[caption id="attachment_17635" align="alignnone" width="540"] Disallowed CTL Attempt[/caption]
Event ID 53 error message details showing that we have failed to access the disallowed CTL:
[caption id="attachment_17645" align="alignnone" width="602"] Event ID 53[/caption]
The following article gives a more detailed overview of the CAPI2 diagnostics feature available on Windows systems, which is very useful when looking at any certificate validation operations occurring on the system:
Troubleshooting PKI Problems on Windows Vista
To help us confirm that the CTL updater engine is indeed affecting the TLS delays and timeouts we’ve described, we can temporarily disable it for both the trusted and untrusted CTLs and then attempt our TLS connections again.
To disable it:
- Create a backup of this registry key (export and save a copy)
- Then create the following DWORD registry values under the key
After applying these steps, you should find that your previously failing TLS connections will no longer timeout. Your symptoms may vary slightly, but you should see speedier connection times, because we have eliminated the delay in trying and failing to reach the CTL URLs.
So, what now?
We should now REVERT the above registry changes by restoring the backup we created, and evaluate the following, more permanent solutions.
We previously stated that disabling the updater engine should only be a temporary measure to confirm the root cause of the timeouts in the above scenarios.
- For the untrusted CTL:
- The automatic disallowed root update mechanism is a built-in OS feature, so we can consider allowing access to the public Microsoft disallowed CTL URL from users’ machines; https://ctldl.windowsupdate.com/msdownload/update/v3/static/trustedr/en/disallowedcertstl.cab
- OR, we can configure and maintain an internal untrusted CTL distribution point as outlined in Configure Trusted Roots and Disallowed Certificates
- For the trusted CTL:
- For server systems you might consider deploying the trusted 3rd party CA certificates via GPO on an as needed basis
Manage Trusted Root Certificates
(particularly to avoid hitting the TLS protocol limitation described here:
SSL/TLS communication problems after you install KB 931125 )
- For client systems, you should consider
Allowing access to the public allowed Microsoft CTL URL https://ctldl.windowsupdate.com/msdownload/update/v3/static/trustedr/en/authrootstl.cab
Defining and maintaining an internal trusted CTL distribution point as outlined in Configure Trusted Roots and Disallowed Certificates
If you require a more granular control of which CAs are trusted by client machines, you can deploy the 3rd Party CA certificates as needed via GPO
Manage Trusted Root Certificates
So there you have it. We hope you found this interesting, and now have an additional factor to take into account when troubleshooting TLS/SSL communication failures.