SFB2019: Get-CsPoolFabricState shows Health: Warning but everything appears ok

Daniel A 106 Reputation points
2021-01-20T01:05:45.38+00:00

Hi All,

When running Get-CsPoolFabricState for the Skype for Business 2019 pool i have down the bottom of the output it has:

*Pool All Server and Services Summary:
Fqdn: SKYPE19-FE-BD1.EXMAPLE.COM Primary: 34 Secondary: 70
Fqdn: SKYPE19-FE-BD2.EXMAPLE.COM Primary: 35 Secondary: 69
Fqdn: SKYPE19-FE-BD3.EXMAPLE.COM Primary: 35 Secondary: 69
WARNING: Fqdn: SKYPE19-FE-BD1.EXMAPLE.COM - Health: Warning Status: Up [Seed Node] Primary: 3 Secondary: 0
WARNING: Fqdn: SKYPE19-FE-BD2.EXMAPLE.COM - Health: Warning Status: Up [Seed Node] Primary: 0 Secondary: 3
WARNING: Fqdn: SKYPE19-FE-BD3.EXMAPLE.COM - Health: Warning Status: Up [Seed Node] Primary: 0 Secondary: 3
WARNING: One or more servers are shutdown, unhealthy or deactivated.  Ensure they are running and activated.  Restart the server if problems persist.*

Has anyone seen this before? I cannot find any events in the event log which have a warning (Lync Server log is clean). Everything with pool appears to be working correctly so I have no idea what it thinks is warning.

I have run

Reset-CsPoolRegistrarState -ResetType:"FullReset"

this completed fine but the warning still persisted (its been 8 hours since running fullreset).

Skype for Business
Skype for Business
A Microsoft communications service that provides communications capabilities across presence, instant messaging, audio/video calling, and an online meeting experience that includes audio, video, and web conferencing.
606 questions
{count} votes

Accepted answer
  1. Daniel A 106 Reputation points
    2021-01-22T03:04:12.4+00:00

    So I finally figured out what this was. In my case it was caused by my Server Default certificate that was due to expire within 90 days (it was still valid).

    Here is how I debugged it (this stuff is not documented anywhere from what I can find)

    From powershell on one of the front end nodes run:

    PS> Connect-ServiceFabricCluster  
      
    PS> Get-ServiceFabricClusterHealth  
      
      
    AggregatedHealthState   : Warning  
    UnhealthyEvaluations    :  
                              Unhealthy nodes: 100% (3/3), MaxPercentUnhealthyNodes=0%.  
      
                              Unhealthy node: NodeName='SKYPE19-FE-BD2.EXMAPLE.COM', AggregatedHealthState='Warning'.  
      
                                    Unhealthy event: SourceId='System.FabricNode', Property='Certificate_cluster', HealthState='Warning', ConsiderWarningAsError=false.  
      
                              Unhealthy node: NodeName='SKYPE19-FE-BD3.EXMAPLE.COM', AggregatedHealthState='Warning'.  
      
                                    Unhealthy event: SourceId='System.FabricNode', Property='Certificate_cluster', HealthState='Warning', ConsiderWarningAsError=false.  
      
                              Unhealthy node: NodeName='SKYPE19-FE-BD1.EXMAPLE.COM', AggregatedHealthState='Warning'.  
      
                                    Unhealthy event: SourceId='System.FabricNode', Property='Certificate_cluster', HealthState='Warning', ConsiderWarningAsError=false.  
      
    NodeHealthStates        :  
                              NodeName              : SKYPE19-FE-BD2.EXMAPLE.COM  
                              AggregatedHealthState : Warning  
      
                              NodeName              : SKYPE19-FE-BD3.EXMAPLE.COM  
                              AggregatedHealthState : Warning  
      
                              NodeName              : SKYPE19-FE-BD1.EXMAPLE.COM  
                              AggregatedHealthState : Warning  
      
    ApplicationHealthStates :  
                              ApplicationName       :  
                              AggregatedHealthState : Ok  
      
                              ApplicationName       : fabric:/System  
                              AggregatedHealthState : Ok  
      
    HealthEvents            : None  
      
    

    To drill down into error run (you can target any of the nodes in the above output):

    PS> Get-ServiceFabricNodeHealth -NodeName "SKYPE19-FE-BD2.EXMAPLE.COM"  
      
      
    NodeName              : SKYPE19-FE-BD2.EXMAPLE.COM  
    AggregatedHealthState : Warning  
    UnhealthyEvaluations  :  
                            Unhealthy event: SourceId='System.FabricNode', Property='Certificate_cluster', HealthState='Warning', ConsiderWarningAsError=false.  
      
    HealthEvents          :  
                            SourceId              : System.FabricNode  
                            Property              : Certificate_cluster  
                            HealthState           : Warning  
                            SequenceNumber        : 132555760970446011  
                            SentAt                : 20/01/2021 12:28:17 AM  
                            ReceivedAt            : 20/01/2021 12:30:17 AM  
                            TTL                   : Infinite  
                            Description           : Certificate expiration: (2021-04-15 06:02:00.000, 81f707cef7d097bc1a0db3c32c213486f501129e)  
                            RemoveWhenExpired     : False  
                            IsExpired             : False  
                            Transitions           : Ok->Warning = 20/01/2021 12:30:17 AM, LastError = 1/01/0001 12:00:00 AM  
      
                            SourceId              : System.FM  
                            Property              : State  
                            HealthState           : Ok  
                            SequenceNumber        : 2  
                            SentAt                : 20/01/2021 12:29:13 AM  
                            ReceivedAt            : 20/01/2021 12:30:46 AM  
                            TTL                   : Infinite  
                            Description           : Fabric node is up.  
                            RemoveWhenExpired     : False  
                            IsExpired             : False  
                            Transitions           : Warning->Ok = 20/01/2021 12:30:46 AM, LastError = 1/01/0001 12:00:00 AM  
    

    Simply replacing the certificate using https://learn.microsoft.com/en-us/powershell/module/skype/set-cscertificate?view=skype-ps ( or the gui if thats your thing) caused the warnings to clear-up.

    Hopefully this helps someone else.

    2 people found this answer helpful.

3 additional answers

Sort by: Most helpful
  1. Sharon Zhao-MSFT 25,056 Reputation points Microsoft Vendor
    2021-01-20T08:44:21.177+00:00

    @Daniel A ,

    Do you do any changes on your environment recently?

    You could check if all the servers in your environment run well in Skype for Business Server Control Panel as below:
    58572-image.png

    For reference, you could read this article to troubleshoot front end service cannot be started.


    If the response is helpful, please click "Accept Answer" and upvote it.

    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


  2. Daniel A 106 Reputation points
    2021-01-20T22:23:33.05+00:00

    No changes. But recently had a certificate expire. The certificate is now fixed. Everything in the control panel is green

    58835-image.png

    0 comments No comments

  3. Daniel A 1 Reputation point
    2021-07-22T23:30:55.35+00:00

    You can also clear these alerts by running the following for each node:

    Send-ServiceFabricNodeHealthReport -NodeName:'SKYPE19-FE-BD2.EXMAPLE.COM' -SourceId:'System.FabricNode' -HealthProperty:'Certificate_cluster' -HealthState Ok -Description "Upcoming cert expiry accepted"
    
    0 comments No comments