Troubleshooting File Replication Service
On This Page
Overview General Procedures for Troubleshooting FRS Problems Troubleshooting FRS Events 13508 without FRS Event 13509 Troubleshooting FRS Event 13511 Troubleshooting FRS Event 13522 Troubleshooting FRS Event 13526 Troubleshooting FRS Event 13548 Troubleshooting FRS Event 13557 Troubleshooting FRS Event 13567 Troubleshooting FRS Event 13568 Troubleshooting Files Not Replicating Verifying the FRS Topology in Active Directory Troubleshooting Morphed Folders Troubleshooting the SYSVOL Directory Junction Troubleshooting Excessive Disk and CPU Usage by NTFRS.EXE
File Replication Service (FRS) supports a multimaster file replication model in which any computer can originate or accept changes to any other computer taking part in the replication configuration. Before you troubleshoot FRS problems, understand the following characteristics of multimaster file replication:
Be aware of how changes made in replicated file areas, including the bulk reset of permissions or other file attributes by administrators or applications, can affect bandwidth.
Any changes to the file system will eventually occur on all other members of the replication set. Do not try to speed up the process by making the same change on other FRS replication partners. This could result in data errors.
If, after modifying a file, you notice that it has somehow reverted back to a previous version, another operator or application might be making changes in the same area, overwriting the earlier changes. In this case, try to find the other operator or application that is causing the problem.
Any files that you delete on one member will be deleted on all other members.
If you rename a file or folder so that it is moved out of the replication tree, FRS will treat it as a deletion on the other replication set members because the file or folder has disappeared from the scope of the replica set.
If two operators create a file or folder at the same time (or before the change has replicated), the file or folder will "morph," or receive a modified name, such as folder_ntfrs_012345678. FRS behaves this way in order to avoid data loss in such situations.
Keep the FRS service running at all times in order to avoid a journal wrap condition.
Table 2.6 shows common events and symptoms that indicate FRS problems and the solution or action required.
Table 2.6 Events and Symptoms that Indicate FRS Problems
Event or Symptom
FRS Event ID 13508
FRS was unable to create an RPC connection to a replication partner.
If this message is not followed by an FRS event ID 13509, troubleshoot FRS event ID 13508 without FRS event ID 13509.
FRS Event ID 13509
FRS was able to create an RPC connection to a replication partner.
No action required.
FRS Event ID 13511
The FRS database is out of disk space.
Treat this as a priority 1 problem. Troubleshoot FRS event ID 13511.
FRS Event ID 13522
The staging area is full.
If you are using Windows 2000 SP2 or earlier, treat this as a priority 1 problem. If you are using SP3, treat this as a priority 3 problem. Troubleshoot FRS event ID 13522.
FRS Event ID 13526
The SID cannot be determined from the distinguished name.
Treat this as a priority 1 problem. Troubleshoot FRS event ID 13526.
FRS Event ID 13548
System clocks are too far apart on replica members.
Treat this as a priority 1 problem. Troubleshoot FRS event ID 13548.
FRS Event ID 13557
Duplicate connections are configured.
Treat this as a priority 1 problem. Troubleshoot FRS event ID 13557.
FRS Event ID 13567
Excessive replication was detected and suppressed.
Treat this as a priority 2 problem. Troubleshoot FRS event ID 13567.
FRS Event ID 13568
Journal wrap error.
If you are using Windows 2000 SP2 or earlier, treat this as a priority 2 problem. If you are using SP3, treat this as a priority 1 problem. Troubleshoot FRS event ID 13568.
Files are not replicating
Files can fail to replicate for a wide range of underlying reasons: DNS, file and folder filters, communication issues, topology problems, insufficient disk space, FRS servers in an error state, or sharing violations.
Troubleshoot files not replicating.
Modified folder names on other domain controllers
If duplicate folders are manually created on multiple domain controllers before they have been able to replicate, FRS preserves content by "morphing" folder names of the last folders to be created.
Troubleshoot morphed folders.
SYSVOL data appears on domain controllers, but \\<domain>\SYSVOL share appears to be empty
SYSVOL folders include a reparse point that points to the correct location of the data. You must take special steps to recover a deleted reparse point.
Troubleshoot the SYSVOL directory junction.
Excessive disk or CPU usage by FRS
A service or application is unnecessarily changing all or most of the files in a replica set on a regular basis. For example, an antivirus software package might be rewriting the ACL on many files, causing FRS to replicate these files unnecessarily.
Troubleshoot excessive disk and CPU usage by NTFRS.exe.
General Procedures for Troubleshooting FRS Problems
For troubleshooting FRS, you can use the Ntfrsutl.exe tool in the Windows 2000 Resource Kit. With Ntfrsutl, you can do the following:
Show the FRS configuration in Active Directory.
List the active replica sets in a domain.
Show the ID table, inbound log, or outbound log for a computer hosting FRS.
Examine memory usage by FRS.
List the application programming interface (API) and version number for FRS.
Poll immediately, quickly, or slowly for changes to the FRS configuration.
Ntfrsutl can be used on remote computers, so you can get status information of any member of a replica set from single console.
For more information about troubleshooting FRS, see the File Replication Service (FRS) link on the Web Resources page at https://www.microsoft.com/windows/reskits/webresources.
Troubleshooting FRS Events 13508 without FRS Event 13509
FRS event ID 13508 is a warning that the FRS service has been unable to complete the RPC connection to a specific replication partner. It indicates that FRS is having trouble enabling replication with that partner and will keep trying to establish the connection.
A single FRS event ID 13508 does not mean anything is broken or not working, as long as it is followed by FRS event ID 13509, which indicates that the problem was resolved. Based on the time between FRS event IDs 13508 and 13509, you can determine if a real problem needs to be addressed.
Note: If FRS is stopped after an event ID 13508 is logged and then later started at a time when the communication issue has been resolved, event ID 13509 will not appear in the event log. In this case, look for an event indicating that FRS has started, and ensure it is not followed by another event 13508.
Because FRS servers gather replication topology information from the closest domain controller, a replica partner in another site will not be aware of the replica set until the topology information has been replicated to domain controllers in that site. When the topology information finally reaches that distant domain controller, the FRS partner in that site will be able to participate in the replica set and FRS event ID 13509 will be logged. Intrasite Active Directory replication partners replicate every five minutes. Intersite replication only replicates when the schedule is open (the shortest delay is 15 minutes). In addition, FRS polls the topology at defined intervals: five minutes on domain controllers, and one hour on other member servers of a replica set. These delays and schedules can delay propagation of the FRS replication topology, especially in topologies with multiple hops.
Procedures for Troubleshooting FRS Event 13508 without Event 13509
Examine the FRS event ID 13508 to determine the machine that FRS has been unable to communicate with.
Determine whether the remote machine is working properly, and verify that FRS is running on it. Type the following command at a command prompt on the computer that logged the FRS event ID 13508 and press ENTER:
ntfrsutl version <FQDN of remote domain controller>
If this fails, check network connectivity by using the Ping command to ping the fully qualified domain name (FQDN) of the remote domain controller from the computer that logged the FRS event ID 13508. If this fails, then troubleshoot as a DNS or TCP/IP issue. If it succeeds, confirm that the FRS service is started on the remote domain controller.
Determine whether FRS has ever been able to communicate with the remote computer by looking for FRS event ID 13509 in the event log and see if the FRS problem correlates to recent change management to networking, firewalls, DNS configuration, or Active Directory infrastructure.
Determine whether anything between the two machines is capable of blocking RPC traffic, such as a firewall or router.
Confirm that Active Directory replication is working. For more information about troubleshooting Active Directory replication, see Troubleshooting Active Directory Replication Problems in this guide.
Troubleshooting FRS Event 13511
FRS event ID 13511 is logged when the FRS database is out of disk space. To correct this situation, delete unnecessary files on the volume containing the FRS database. If this is not possible, then consider moving the database to a larger volume with more free space. For more information about how to move the database to a larger volume, see Knowledge Base article 221093: How to Relocate the NTFRS Jet Database and Log Files. To view this Knowledge Base article, see the Microsoft Knowledge Base link on the Web Resources page at https://www.microsoft.com/windows/reskits/webresources.
Troubleshooting FRS Event 13522
The Staging Directory is an area where modified files are stored temporarily either before being propagated to other replication partners or after being received from other replication partners. FRS encapsulates the data and attributes associated with a replicated file or directory object in a staging file. FRS needs adequate disk space for the staging area on both upstream and downstream machines in order to replicate files.
On Windows 2000 SP2 and earlier, FRS event 13522 indicates that the FRS service has paused because the staging area is full. Replication will resume if disk space for the staging area becomes available or if the disk space limit for the staging area is increased.
On Windows 2000 SP3, you must clear the replication backlog. Reasons why the staging area might fill up include:
One or more downstream partners are not accepting changes. This could be a temporary condition due to the schedule being turned off and FRS waiting for it to open, or a permanent state because the service is turned off, or the downstream partner is in an error state.
The rate of change in files exceeds the rate at which FRS can process them.
No obvious changes are made to the files but the staging area is filling up anyway. To troubleshoot this excessive replication, see "Troubleshooting FRS Event 13567" in this guide.
A parent directory for files that have a large number of changes is failing to replicate in so all changes to subdirectories are blocked.
Troubleshooting FRS Event 13526
FRS event ID 13526 is logged when a domain controller becomes unreachable. This problem occurs because FRS polls Active Directory at regular intervals to read FRS configuration information. During the polling, an operation is performed to resolve the security identifier (SID) of an FRS replication partner. The binding handle might become invalid if the bound domain controller becomes unreachable over the network or restarts in a single polling interval (the default is five minutes).
To resolve this issue, stop and start FRS on the computer logging the error message.
Troubleshooting FRS Event 13548
FRS event ID 13548 is logged when the time settings for two replication partners differ by more than 30 minutes. This error could be caused by the selection of an incorrect time zone on the local computer or its replication partner.
Check that the time zone and system clock are correctly set on both computers. They must be within 30 minutes of each other, but preferably much closer.
Troubleshooting FRS Event 13557
FRS event ID 13557 is logged when duplicate connections are detected between two replication partners. To resolve this problem, delete duplicate connection objects between the direct replication partners that are noted in the event text.
Troubleshooting FRS Event 13567
Event 13567 in the FRS event log is generated on computers running Windows 2000 SP3 when unnecessary file change activity is detected.
Unnecessary file change activity means that a file has been written by some user or application, but no change is actually made to the file. FRS detects that the file has not changed, and maintains a count of how often this happens. If the condition is detected more than 15 times per hour during a three-hour period, FRS logs the 13567 event.
Determine the application or user that is modifying file content. For procedures to troubleshoot this issue, see "Troubleshooting Excessive Disk and CPU Usage by NTFRS.EXE" in this guide. More information can also be found in Knowledge Base article 315045: FRS Event 13567 Is Recorded in the FRS Event Log with SP3. To view this Knowledge Base article, see the Microsoft Knowledge Base link on the Web Resources page at https://www.microsoft.com/windows/reskits/webresources.
Troubleshooting FRS Event 13568
FRS event ID 13568 contains the following message:
The File Replication Service has detected that the replica set "1" is in JRNL_WRAP_ERROR.
NTFS maintains a special log called the NTFS USN journal, which is a high-level description of all the changes to files and directories on an NTFS volume. FRS uses this mechanism in order to track changes to NTFS directories of interest, and to queue those changes for replication to other computers. The NTFS USN journal has defined size limits and will discard old log information on a first-in, first-out basis in order to maintain its correct size.
If FRS processing falls behind the NTFS USN journal, and if NTFS USN journal information that FRS needed has been discarded, then FRS enters a journal wrap condition. FRS then needs to rebuild its current replication state with respect to NTFS and other replication partners.
Each file change on the NTFS volume occupies approximately 100 bytes in this journal (possibly more, depending on the file name size). In general, the NTFS USN journal for an NTFS volume should be sized at 128 megabytes (MB) per 100,000 files being managed by FRS on that NTFS volume.
In Windows 2000 SP2 and earlier, the default journal size is 32 MB and the maximum journal size is 128 MB. In Windows 2000 SP3, the default journal size is 128 MB, and the maximum journal size is 10,000 MB
The journal size can be configured with a registry subkey, but keep in mind that once you increase journal size you should not lower it again because this will cause a journal wrap. To learn how the USN journal size can be increased see Knowledge Base article 221111: Description of FRS Entries in the Registry. To view this Knowledge Base article, see the Microsoft Knowledge Base link on the Web Resources page at https://www.microsoft.com/windows/reskits/webresources/.
FRS can encounter journal wrap conditions in the following cases:
Many files are added at once to a replica tree while FRS is busy, starting up, or not running.
On a server that is being used for authoritative restore, or as the primary server for a new replica partner, excessive file activity at the start of this process can consume NTFS USN journal records. Size the NTFS volume at 128 MB per 100,000 files being managed by FRS, as mentioned above, to avoid this condition.
NTFS needs to be processed with Chkdsk and Chkdsk corrects the NTFS structure. In this case, NTFS creates a new NTFS USN journal for the volume or deletes the corrupt entries from the end of the journal.
The NTFS USN journal is deleted or reduced in size.
FRS is in an error state that prevents it from processing changes in the NTFS USN journal.
If FRS is experiencing journal wrap errors on a particular server, it cannot replicate files until the condition has been cleared. To continue replication, the administrator must stop FRS on that server and perform a non-authoritative restore of the data so that the system can synchronize with its replication partners. For more information about performing a non-authoritative restore, see "Performing a Non-Authoritative Restore" in this guide.
Note the following:
Windows 2000 SP1 cannot perform this process automatically.
In Windows 2000 SP2, FRS performs this process automatically.
In Windows 2000 SP3, FRS does not perform this process automatically. The reason for this change was that it was typically being performed at times that were not planned by administrators. However, a registry setting is available that allows FRS to perform the automatic nonauthoritative restore, just as in Windows 2000 SP2. However, it is recommended to leave this as a manual process.
For more information about performing the nonauthoritative restore process on a server, see Knowledge Base article 292438: Troubleshooting Journal Wrap Errors on SYSVOL and DFS Replica Sets. To view this Knowledge Base article, see the Microsoft Knowledge Base link on the Web Resources page at https://www.microsoft.com/windows/reskits/webresources/.
Troubleshooting Files Not Replicating
Files can fail to replicate for a wide range of causes. As a best practice, find the root cause of FRS replication problems.
Procedures for Troubleshooting Files that Are Not Replicating
Verify that Active Directory replication is functioning. For more information about troubleshooting Active Directory replication, see "Troubleshooting Active Directory Replication Problems" in this guide. Each domain controller must have at least one inbound connection to another domain controller in the same domain.
Examine the event logs on the machines involved. Resolve any problems found.
Use the Ntfrsutl ver command from the source to the destination computer, and vice versa. Verify that the addresses are correct. Verify RPC connectivity between the source and destination. Also verify that FRS is running.
Use the Services administrative console to confirm that FRS is running on the remote computer.
If FRS is not running, review the File Replication service event log on the problem computer. If the service has asserted, troubleshoot the assertion. Otherwise, restart the service by using the net start ntfrs command.
Verify that Active Directory replication is functioning. If it is not, see "Troubleshooting Active Directory Replication Problems" in this guide.
Use Active Directory Sites and Services to verify the replication schedule on the connection object to confirm that replication is enabled between the source and destination computers and also that the connection is enabled. The connection object is the inbound connection from the destination computer under the source computer's NTFRS_MEMBER object. For SYSVOL, the connection object resides under \Servers\server_name\NTDS Settings.
Create a test file on the destination computer, and verify its replication to the source computer, taking into account the schedule and link speed for all hops between the two computers.
Check for files that are larger than the amount of free space on the source or destination server or larger than the size of the staging area directory limit in the registry. Resolve the disk space problem or increase the maximum staging area file space. For more information about troubleshooting staging area problems, see "Troubleshooting FRS Event 13522" in this guide.
Check whether the source file was excluded from replication. Confirm that the file is not encrypted by using Encrypting File System (EFS), an NTFS junction point (as created by Linkd.exe from the Windows 2000 Server Resource Kit), or excluded by a file or folder filter on the originating replica member. If any of these conditions are true, FRS does not replicate such files or directories.
Check whether the file is locked on either computer. Use the net file command on the source and destination computers. This command indicates which users are holding the file open on the network, but will not report any files being held open by local processes.
If the file is locked on the source computer, then FRS will be unable to read the file to generate the staging file, and replication will be delayed. If the file is locked on the destination computer, then FRS will be unable to update the file. In this case, FRS continues to retry the update until it succeeds. The retry interval is 30 to 60 seconds.
If files are being held open by remote users, you can use the net file <id> /close command to force the file closed.
If these methods do not resolve the issue, you can investigate the FRS debug logs to get more details on what is causing the replication to fail. FRS creates text-based logs in the %systemroot%\debug\ntfrs_*.log directory to help you debug problems. Debug logs effectively describe a two-way conversation between replication partners. A higher value indicates the log is more recent (for example, ntfrs_0001.log is oldest and ntfrs_0005.log is newest).
To observe a particular event, take a snapshot of the log files as close to the occurrence of the event as possible. Save the log files in a different directory so they can be examined afterward. Debug lines containing the string :T: are known as "tracking records" and are typically the most useful for understanding why specific files fail to replicate. You can redirect records of interest to a text file using the FINDSTR command. For example:
findstr /I ":T:" %systemroot%\debug\ntfrs_*.log >trackingrecords.txt findstr /I "error warn fail S0" %systemroot%\debug\ntfrs_*.log >errorscan.txt
Important: SYSVOL uses FRS as the means to replicate data. When troubleshooting FRS, focus on how to enable it to run again, instead of trying to "help" replication by manually copying files to replication partners. This can be used as a stop gap, but requires reinitializing the entire replica set. Manually copying files can cause additional replication traffic, backlogs, and potential replication conflicts. For more information about replication conflicts, see "Troubleshooting Morphed Folders" later in this guide.
Verifying the FRS Topology in Active Directory
Because FRS servers gather their replication topology information from their closest Active Directory domain controller, FRS replication relies on Active Directory replication functioning properly. Two approaches to verifying that Active Directory is replicating FRS replication topology information correctly include:
Verify the FRS topology in Active Directory from multiple servers.
For more information about verifying the FRS topology, see the File Replication Service (FRS) link on the Web Resources page at https://www.microsoft.com/windows/reskits/webresources/.
Troubleshooting Morphed Folders
All files and folders that FRS manages are uniquely identified internally by a special file identifier. FRS uses these identifiers as the canonical identifiers of files and folders that are being replicated. If FRS receives a change order to create a folder that already exists, which by definition has a different file identifier than the duplicate folder, FRS protects the conflicting change by leaving the original directory structure intact, and renaming the conflicting directory to a unique name so that underlying files and folders can be preserved.
The conflicting folder will be given a new name of the form FolderName_NTFRS_<guidname> where FolderName was the original name of the folder and GUID is a unique character string like "001a84b2."
Two common causes of this condition are:
A folder is created on multiple machines in the replica set before the folder has been able to replicate. This could be due to the administrator or application duplicating folders of the same name on multiple FRS members.
You initiate an authoritative restore on one server and either:
Did not stop the service on all other members of the reinitialized replica set before restarting FRS after the authoritative restore, or
Did not set the D2 registry key for the authoritative restore on all other members of the reinitialized replica set before a server replicated outbound changes to reinitialized members of the replica set.
Manually copied directories with names identical to those being replicated by FRS to computers in the replica set.
For more information about performing an authoritative restore, see "Active Directory Backup and Restore" in this guide.
To recover from morphed folders you have two options:
Move the morphed directories out of the replica tree and back in
Rename the morphed directories.
The first method works well for small amounts of data on a small number of targets. However, if you miss end-to-end replication of the move-out, this method can cause morphed directories. This method also forces all members to re-replicate data. The second method does not require re-replication of data. However, it can cause a denial-of-service condition by giving an invalid path when the originating path is renamed.
Procedures for Moving Morphed Directories Out of the Replica Tree and Back In
Move all morphed directories out of the tree.
Wait for end-to-end removal of data on all targets.
While waiting, build a tree containing the desired files and folder versions, including permissions and other attributes.
Verify end-to-end deletion of the "move-out" on all targets, otherwise you get a conflict in the next step. Perform a nonauthoritative restore of computers that did not replicate in the deletion. Disable FRS on computers that you could not restore. For more information about authoritative and nonauthoritative restores, see "Active Directory Backup and Restore" in this guide.
Move data from outside of tree to inside of the replicated tree. Use the SCOPY or XCopy /O command to preserve permissions.
Procedures for Renaming Morphed Directories
From the computer that originated the good series in conflict, rename both the good and morphed variants to a unique name.
Verify end-to-end replication of the rename operation across all members of the set. For those that do not get the rename within the necessary point in time, stop FRS and set the D2 registry setting for a nonauthoritative restore. Do not restart the computer at this time.
Move any files from the now renamed morphed folders to the renamed good folders.
Verify end-to-end replication of the files in the renamed original folder.
Delete the original morphed files.
Restart FRS to start the authoritative restore. After the rename has propagated, it can be deleted. Before deleting any of the folders, ensure that you have a backup of the original (and complete) folder.
Troubleshooting the SYSVOL Directory Junction
The SYSVOL share contains two folders that are directory junctions that point to other folders, much like a symbolic link.
Procedures for Troubleshooting the SYSVOL Directory Junction
At a command prompt, type the following commands and press ENTER:
dir <drive>:<path>\SYSVOL\SYSVOL dir <drive>:<path>\SYSVOL\Staging Areas
Verify that junction points are in place. The following output example shows junction points. <pre IsFakePre="true" xmlns="https://www.w3.org/1999/xhtml">
D:\WINNT\SYSVOL\sysvol>dir 06/26/2001 01:23p <DIR> . 06/26/2001 01:23p <DIR> .. 06/26/2001 01:23p <JUNCTION> corp.com D:\WINNT\SYSVOL\staging areas>dir 06/26/2001 01:23p <DIR> . 06/26/2001 01:23p <DIR> .. 06/26/2001 01:23p <JUNCTION> corp.com
If either of the two junction points is missing, use the Linkd.exe tool from the Windows 2000 Server Resource Kit to recreate them. At a command prompt, type the following command and press ENTER:
linkd <drive>:<path>\SYSVOL\SYSVOL<fully qualified domain name> <drive><path>\SYSVOL<domain> linkd <drive>:<path>\SYSVOL\Staging Areas<fully qualified domain name> <drive><path>\SYSVOL< domain>
Verify the same path for staging and staging areas.
Caution: Take great care when copying folders that include directory junctions. When Xcopy copies such a tree in Windows 2000, it copies the junction, not the contents of the folder the junction points to. An administrator can accidentally delete SYSVOL by using the RD /S command on a copy made of SYSVOL. Use RD without the /S parameter instead, because RD /S will follow the directory junction, but the RD command without /S will not.
Troubleshooting Excessive Disk and CPU Usage by NTFRS.EXE
Extensive replication generators are applications or operations that change all or most of the files in a replica set on a regular basis without the changes being necessary. FRS monitors the USN journal for changes, and if it finds a change, it has to replicate this file. The applications that create extensive replication normally rewrite the ACL (in the case of file security policies and antivirus software) or rewrite the file (in the case of defragmentation software). In both cases, the content, permissions, and attributes on the file or directory are not really changed.
For Windows 2000 SP 3, Event ID 13567 in the FRS event log records that this kind of "non change" was suppressed in order to prevent unnecessary replication. In versions of Windows 2000 earlier than SP3, extensive replication generators were the most common reason for staging areas to fill up. Administrators should still look for and eliminate extensive replication generators when using SP3, because the file comparison consumes disk and file resources.
You can use one of the following methods to identify excessive replication generators:
Selectively turn off common causes such as antivirus software, defragmentation tools, and file system policy, and determine if this activity declines.
Use the FileSpy tool from the Windows 2000 Server Resource Kit to identify file information.
Inspect the NTFRSUTL OUTLOG report to see which files are being replicated.
Inspect the USN journal tracking records in the FRS debug logs on computers running Windows SP2 or later with the following command:
Findstr /I ":U:" %systemroot%\debug\ntfrs_00*.log
For more information about troubleshooting excessive disk and CPU usage by Ntfrs.exe, see the following Knowledge Base articles:
284947: "Norton AntiVirus 7.x Makes Changes to Security Descriptors"
282791: "FRS: Disk Defragmentation Causes FRS Replication Traffic"
279156: "Effects of Setting File System Policy on a Disk Drive or Folder"
307777: "Possible Causes of a Full File Replication Service Staging Area"
To view these Knowledge Base articles, see the Microsoft Knowledge Base link on the Web Resources page at https://www.microsoft.com/windows/reskits/webresources/. For more information about troubleshooting high CPU usage on a domain controller, see Troubleshooting High CPU Usage on a Domain Controller in this guide.