ADFS Server suddenly starts failing exactly 1 month after a certificate renewal

zachary cohan 6 Reputation points
2022-09-13T00:23:14.483+00:00

A month or so ago, my colleague and I did a cert renewal and everything went relatively okay. We have been running smoothly for the past 31 days (since the expiration of the cert)

We got an SSL Certificate from a CA, it was installed into the certificate store of our ADFS server, we binded the cert in IIS and set ADFS to use the new cert (service communications, token-signing, token-decrypting).

(Please note, we aren't adfs experts. we tried our best, but apparently we did something wrong.)

Despite everything working smoothly for the past month, everything has broken now.
Here are the symptoms:
-Accessing any of our ADFS endpoints from the browser results in a 503 error
-The ADFS service is not running and trying to start the service results in an Error 1064.
240258-image.png

When we look in the certificates store, we can see that the new certificate is installed. Here is what is interesting:
Originally, when we were looking at the errors in the server manager, we were getting pairs of "381 errors" and "102 errors". (These errors showed up after a restart of the server)
The "381" error reads :
An error occurred during an attempt to build the certificate chain for configuration certificate identified by thumbprint 'XXXXXXXXXXXD5E85AED342A39EC63523F6AF55AF2'. Possible causes are that the certificate has been revoked or certificate is not within its validity period.
The following errors occurred while building the certificate chain:
MSIS2013: A required certificate is not within its validity period when verifying against the current system clock.

The thumbprint that is being referenced is the thumbprint of the OLD cert that had already expired. Naively, we decided to just delete the old certificate from the cert store (maybe then adfs will be forced to use the new cert?... but no)

After deleting the expired cert from the store, we would get pairs of "249 warnings" and "102 errors".
The 249 warning states:
The certificate identified by thumbprint 'XXXXXXXXXXXD5E85AED342A39EC63523F6AF55AF2' could not be found in the certificate store. In certificate rollover scenarios, this can potentially cause a failure when the Federation Service is signing or decrypting using this certificate.

So it seems that the ADFS server is TRYING to use the old cert to build the certificate chain, regardless of if it is available or not in the store

We tried to use powershell to force ADFS to use the other certs
Set-AdfsSslCertificate -Thumprint "<THUMBPRINT_OF_CORRECT_CERT>

but this resulted in getting the following message:
Get-AdfsCertificate : Could not connect to net.tcp://localhost:1500/policy. The connection attempt lasted for a time
span of 00:00:02.0499038. TCP error code 10061: No connection could be made because the target machine actively
refused it 127.0.0.1:1500.
At line:1 char:1

  • Get-AdfsCertificate
  • ~~~~~~~~~~~~~~~~~~~
  • CategoryInfo : OpenError: (:) [Get-AdfsCertificate], EndpointNotFoundException
  • FullyQualifiedErrorId : Could not connect to net.tcp://localhost:1500/policy. The connection attempt lasted for
    a time span of 00:00:02.0499038. TCP error code 10061: No connection could be made because the target machine acti
    vely refused it 127.0.0.1:1500. ,Microsoft.IdentityServer.Management.Commands.GetCertificateCommand

Thanks in advance for any help. We really are at a loss. It has been working fine for a month (exactly 1 month, if that is a clue to anyone more knowledgeable than myself), but we can't quite pin down the problem

Windows Server 2019
Windows Server 2019
A Microsoft server operating system that supports enterprise-level management updated to data storage.
3,444 questions
Active Directory Federation Services
Active Directory Federation Services
An Active Directory technology that provides single-sign-on functionality by securely sharing digital identity and entitlement rights across security and enterprise boundaries.
1,187 questions
0 comments No comments
{count} vote

3 answers

Sort by: Most helpful
  1. Pierre Audonnet - MSFT 10,166 Reputation points Microsoft Employee
    2022-09-14T22:34:11.567+00:00

    The timing of exactly one month is interesting as it might be an issue with the CRL being expired or something. But that might also be anecdotal.

    Can you check the if the HTTP bindings still use the old one? You can do that with netsh http show sslcert. If that's still the old one there, you can update it with netsh too: netsh http add sslcert ipport=<ADFS URL>:443 certhash=<hash of the TLS cert> appid={5d89a20c-beab-4389-9447-324788eb944a}

    0 comments No comments

  2. Limitless Technology 43,926 Reputation points
    2022-09-19T06:52:19.103+00:00

    Hello there,

    To be honest you must work a lot to find the exact reason for this behavior. You can start by checking whether the AD FS configuration database is running.

    -If you are using Windows Internal Database (WID) as an AD FS configuration database, open services.msc, and check whether the Windows Internal Database service is running.

    -If you are using the SQL Server service as an AD FS configuration database, open services.msc. Check whether the SQL Server service is running. You can also create a Test.udl file and populate the connection string to test connectivity to Microsoft SQL Server.

    Here are some links to help you out with troubleshooting steps AD FS 2.0 service fails to start https://learn.microsoft.com/en-us/troubleshoot/windows-server/identity/adfs-2-service-fails-to-start

    ---------------------------------------------------------------------------------------------------------

    --If the reply is helpful, please Upvote and Accept it as an answer–

    0 comments No comments

  3. Jeffrey Bostoen 0 Reputation points
    2023-04-21T06:26:35.6966667+00:00

    Running into the same issue. Just as some other folks, I already removed some old certificates. Most comments in other forums suggest cmdlets which lead to Could not connect to net.tcp://localhost:1500/policy. anyway. Any ideas?