Problems with DFSR SYSVOL, NETLOGON replication

Andy Baravi 21 Reputation points
2023-01-08T05:13:00.467+00:00

Hello everyone,

I am having 2 DCs Windows 2016 with DFSR replication type. I am having a dillemma. I do not know which of the DCs is at fault....

DC-001 that holds FSMO roles throws the following errors:

  Starting test: FrsEvent  

     * The File Replication Service Event log test   
     Skip the test because the server is running DFSR.  

     ......................... DC-001 passed test FrsEvent  

Starting test: DFSREvent

     The DFS Replication Event Log.   
     There are warning or error events within the last 24 hours after the SYSVOL has been shared.  Failing SYSVOL  

     replication problems may cause Group Policy problems.   
     A warning event occurred.  EventID: 0x80001396  

        Time Generated: 01/04/2023   20:34:34  

        Event String:  

        The DFS Replication service is stopping communication with partner DC-000 for replication group Domain System Volume due to an error. The service will retry the connection periodically.   

           

        Additional Information:   

        Error: 9033 (The request was cancelled by a shutdown)   

        Connection ID: D8098552-5382-4E6B-9107-4AA61EC2F9A0   

        Replication Group ID: 2A016BE6-ACDC-4A11-9B2A-8D96BC15495D  

     A warning event occurred.  EventID: 0x80001396  

        Time Generated: 01/04/2023   20:49:04  

        Event String:  

        The DFS Replication service is stopping communication with partner DC-000 for replication group Domain System Volume due to an error. The service will retry the connection periodically.   

           

        Additional Information:   

        Error: 9033 (The request was cancelled by a shutdown)   

        Connection ID: D8098552-5382-4E6B-9107-4AA61EC2F9A0   

        Replication Group ID: 2A016BE6-ACDC-4A11-9B2A-8D96BC15495D  

     An error event occurred.  EventID: 0xC000138A  

        Time Generated: 01/04/2023   20:49:38  

        Event String:  

        The DFS Replication service encountered an error communicating with partner DC-000 for replication group Domain System Volume.   

======================================================================

Second DC-000 is throwing another error:

  Starting test: FrsEvent  

     * The File Replication Service Event log test   
     Skip the test because the server is running DFSR.  

     ......................... DC-000 passed test FrsEvent  

  Starting test: DFSREvent  

     The DFS Replication Event Log.   
     There are warning or error events within the last 24 hours after the SYSVOL has been shared.  Failing SYSVOL  

     replication problems may cause Group Policy problems.   
     A warning event occurred.  EventID: 0x800008A5  

        Time Generated: 01/04/2023   20:34:41  

        Event String:  

        The DFS Replication service stopped replication on volume C:. This occurs when a DFSR JET database is not shut down cleanly and Auto Recovery is disabled. To resolve this issue, back up the files in the affected replicated folders, and then use the ResumeReplication WMI method to resume replication.   

           

        Additional Information:   

        Volume: C:   

        GUID: 14B12066-B2F6-11E4-93EB-806E6F6E6963   

           

        Recovery Steps   

        1. Back up the files in all replicated folders on the volume. Failure to do so may result in data loss due to unexpected conflict resolution during the recovery of the replicated folders.   

        2. To resume the replication for this volume, use the WMI method ResumeReplication of the DfsrVolumeConfig class. For example, from an elevated command prompt, type the following command:   

        wmic /namespace:\\root\microsoftdfs path dfsrVolumeConfig where volumeGuid="14B12066-B2F6-11E4-93EB-806E6F6E6963" call ResumeReplication   

           

        For more information, see http://support.microsoft.com/kb/2663685.  

     A warning event occurred.  EventID: 0x800008A5  

==============================================================================================================================

Please what to do .... I was thinking to demote one of them. I just do not know which on is at fault ?

Thanks in advance,
Andy

Windows Server Migration
Windows Server Migration
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.Migration: The process of making existing applications and data work on a different computer or operating system.
408 questions
0 comments No comments
{count} votes

Accepted answer
  1. Dave Patrick 426.1K Reputation points MVP
    2023-01-08T19:44:09.893+00:00

    Reinstalled last domain controller in question. After promoting DC-000 the SYSvolS are not even created. There is no replication from our main DC-001 to DC-000.

    Ok, then the logs still report some errors over the last 24 hrs causing some confusion. These two are (or at least were) problematic.

    server has been disconnected from other partners for 172 days, which is longer than the time allowed by the MaxOfflineTimeInDays parameter (60)

    Indicates a tombstoned condition where you generally need to remove it from network, then perform cleanup before adding a new one back
    https://learn.microsoft.com/en-us/windows-server/identity/ad-ds/deploy/ad-ds-metadata-cleanup
    https://techcommunity.microsoft.com/t5/itops-talk-blog/step-by-step-manually-removing-a-domain-controller-server/ba-p/280564

    There are no more endpoints available from the endpoint mapper

    This one was most likely the result of DFSR trying to open a connection to other server and failing. On the next try the endpoint mapper uses another dynamic port and on and on until all dynamic ports are consumed.

    I'd suggest moving the roles off (if needed) then removing the new one again. Then confirm health is 100% (dcdiag, repadmin, System and DFS Replication event logs are free of errors) before adding another. Put up some new log files to look at if problems persist.

    Its possible that the DC-001 may need an authoritative restore after you remove the DC-000. The System and DFS Replication event logs will tell us this.
    https://learn.microsoft.com/en-US/troubleshoot/windows-server/group-policy/force-authoritative-non-authoritative-synchronization

    --please don't forget to upvote and Accept as answer if the reply is helpful--


3 additional answers

Sort by: Most helpful
  1. Dave Patrick 426.1K Reputation points MVP
    2023-01-08T13:49:01.953+00:00

    Please run;

    Dcdiag /v /c /d /e /s:%computername% >C:\dcdiag.log (run on PDC emulator)
    repadmin /showrepl >C:\repl.txt (run on any domain controller)
    ipconfig /all > C:\%computername%.txt (run on EVERY domain controller)

    Also check the domain controller System and Replication (DFS or FRS) event logs for errors since last boot. Post the Event Source and Event IDs of any found. (no evtx files)

    then put unzipped text files up on OneDrive and share a link.

    1 person found this answer helpful.

  2. Andy Baravi 21 Reputation points
    2023-01-08T19:21:30.083+00:00

    Patrick thank you for answering,

    Development:

    DC-001 is our primary (FSMO) DC.
    I decided to demote Dc-000 as domain controller.
    Reinstalled last domain controller in question. After promoting DC-000 the SYSvolS are not even created. There is no replication from our main DC-001 to DC-000.

    Finally I generated the logs as per your request:
    https://ourvolaris-my.sharepoint.com/:f:/g/personal/andy_baravi_portfolioplus_com/Eloz6Xuh3yFNqBYVJMh8Ro4BnklZ6hFGW0J1V9EosZ-7tA?e=QyZNaa

    0 comments No comments

  3. Andy Baravi 21 Reputation points
    2023-01-08T23:54:27.563+00:00

    Hi Patrik,

    Thank you for helping on this one.

    I changed MaxOfflineTimeInDays from default value 60 to 200 days. Restarted newly promoted domain controller and Replication picked it up. Everything is ok now,.

    Thanks for your help!