Troubleshooting Active Directory Replication Problems

Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 with SP2

Active Directory replication problems can have several different sources. For example, Domain Name System (DNS) problems, networking issues, or security problems can all cause Active Directory replication to fail.

Inbound or outbound replication failure causes Active Directory objects that represent the replication topology, replication schedule, domain controllers, users, computers, passwords, security groups, group memberships, and Group Policy to be inconsistent between domain controllers. Directory inconsistency causes either operational failures or inconsistent results, depending on the domain controller that is contacted for the operation at hand. Active Directory depends on network connectivity, name resolution, authentication and authorization, the directory database, the replication topology, and the replication engine. When the root cause of a replication problem is not immediately obvious, determining the cause among the many possible causes requires systematic elimination of probable causes.

Event and Tool Solution Recommendations

Ideally, the red (Error) and yellow (Warning) events in the Directory Service event log suggest the specific constraint that is causing replication failure on the source or destination domain controller. If the event message suggests steps for a solution, try the steps listed in the event. The Repadmin tool and other diagnostic tools also provide information that can help you resolve replication failures.

Ruling Out the Obvious

Sometimes replication errors occur because of intentional disruptions. For example, when you troubleshoot Active Directory replication problems, rule out intentional disconnections and hardware failures or upgrades first.

Intentional Disconnections

If replication errors are reported by a domain controller that is attempting replication with a domain controller that has been built in a staging site and is currently offline awaiting its deployment in the final production site (remote), you can account for those errors. To avoid separating a domain controller from the replication topology for extended periods, which causes continuous errors until the domain controller is reconnected, consider adding such computers initially as member servers and using the install-from-media method to install Active Directory. You can back up an up-to-date domain controller to removable media (CD/DVD or other media) and ship the media to the destination site. Then, you can use the media to promote the domain controllers at the site, without requiring replication. For more information about installing from media, see Installing a Domain Controller in an Existing Domain Using Restored Backup Media.

Hardware Failures or Upgrades

If replication problems occur as a result of hardware failure (for example, failure of the motherboard, disk subsystem, or hard drive), notify the server owner so that the hardware problem can be resolved.

Periodic hardware upgrades can also cause domain controllers to be out of service. Ensure that your server owners have a good system of communicating such outages in advance.

Correct Response to Any Outdated Server Running Windows 2000 Server

If a domain controller running Windows 2000 Server has failed for longer than the number of days in the tombstone lifetime, the solution is always the same:

  1. Move the server from the corporate network to a private network.

  2. Either forcefully remove Active Directory or reinstall the operating system.

  3. Remove the server metadata from Active Directory so that the server object cannot be revived.

Note

By default, NTDS Settings objects that are deleted are revived automatically for a period of 14 days. Therefore, if you do not remove server metadata (use Ntdsutil to perform metadata cleanup), the server metadata is reinstated in the directory, which prompts replication attempts to occur. In this case, errors will be logged persistently as a result of the inability to replicate with the missing domain controller.

Root Causes

If you rule out intentional disconnections, hardware failures, and outdated Windows 2000 domain controllers, the remainder of replication problems almost always have one of the following root causes:

  • Network connectivity: The network connection might be unavailable or network settings are not configured properly.

  • Name resolution: DNS misconfigurations are a common cause for replication failures.

  • Authentication and authorization: Authentication and authorization problems cause "Access denied" errors when a domain controller tries to connect to its replication partner.

  • Directory database (store): The directory database might not be able to process transactions fast enough to keep up with replication timeouts.

  • Replication engine: If intersite replication schedules are too short, replication queues might be too large to process in the time that is required by the outbound replication schedule. In this case, replication of some changes can be stalled indefinitely — potentially, long enough to exceed the tombstone lifetime.

  • Replication topology: Domain controllers must have intersite links in Active Directory that map to real wide area network (WAN) or virtual private network (VPN) connections. If you create objects in Active Directory for the replication topology that are not supported by the actual site topology of your network, replication that requires the misconfigured topology fails.

General Approach to Fixing Problems

Use the following general approach to fixing replication problems:

  1. Monitor replication health daily, or use Repadmin.exe to retrieve replication status daily.

  2. Attempt to resolve any reported failure in a timely manner by using the methods described in event messages and this guide. If software might be causing the problem, uninstall the software before you continue with other solutions.

  3. If the problem that is causing replication to fail cannot be resolved by any known methods, remove Active Directory from the server and then reinstall Active Directory. For more information about reinstalling Active Directory, see Decommissioning a Domain Controller.

  4. If Active Directory cannot be removed normally while connected to the network, use one of the following methods to resolve the problem:

    • Force Active Directory removal in Directory Services Restore Mode, clean up server metadata, and then reinstall Active Directory.

    • Reinstall the operating system, and rebuild the domain controller.

    For more information about forcing Active Directory removal, see Forcing the Removal of a Domain Controller.

Monitoring Replication Health

Monitoring for replication failures is critical to being able to solve replication problems quickly and effectively. Use one of the following methods to monitor replication health:

Note

For detailed information on how to use Repadmin, see Monitoring and Troubleshooting Active Directory Replication Using Repadmin (https://go.microsoft.com/fwlink/?LinkId=122830).

  • Use a monitoring application that you set to capture and report specific errors and events on a daily basis.

  • Use the Repadmin tool to retrieve replication status daily.

Using a Monitoring Application to Monitor Replication Health

For all domain controllers in a forest, monitor replication health on a daily basis by using Microsoft Operations Manager (MOM) or an equivalent monitoring application. For information about using MOM to monitor Active Directory, see Active Directory Management Pack Technical Reference for MOM 2005 on the Microsoft Web site (https://go.microsoft.com/fwlink/?LinkId=41369).

Using Repadmin to Retrieve Replication Status

Replication status is an important way for you to evaluate the status of the directory service. If replication is working without errors, you know the domain controllers that are online. You also know that the following systems and services are working:

  • DNS infrastructure

  • Kerberos

  • Windows Time service (W32time)

  • Remote procedure call (RPC)

  • Network connectivity

Use Repadmin (Windows Support Tools) to monitor replication status daily by running a command that assesses the replication status of all domain controllers in your forest. The procedure generates a .csv file that you can open in Excel and filter for replication failures.

Use the following procedure to retrieve the replication status of all domain controllers in the forest.

Requirements

  • Administrative credentials: To complete this procedure, you must be a member of the Domain Admins group in the forest root domain or the Enterprise Admins group in the forest.

  • Tools:

    Repadmin.exe (Windows Support Tools)

    Excel (Microsoft Office)

To retrieve replication status

  1. Open a command prompt, type the following command, and then press ENTER:

    repadmin /showrepl * /csv >showrepl.csv

  2. In Excel, on the File menu, click Open.

  3. In Files of type, click Text Files (*.prn;*.txt;*.csv).

  4. In Look in, navigate to showrepl.csv, and then click Open.

  5. In the Excel spreadsheet, right-click the column heading for showrepl_COLUMNS (column A) and then click Hide. Repeat for the column labeled Transport Type.

  6. Select the row just under the column headings, and then, on the Windows menu, click Freeze Pane.

  7. Click the upper-left corner of the spreadsheet to highlight the entire spreadsheet. On the Data menu, point to Filter, and then click AutoFilter.

  8. In the heading of the Last Success column, click the down arrow, and then click Sort Ascending.

  9. In the heading of the Source DC column, click the down arrow, and then click Custom. In the Custom AutoFilter dialog box, complete the custom filter as follows:

    1. Under Source DC, click does not contain.

    2. In the corresponding text box, type del to filter deleted domain controllers from the spreadsheet.

  10. In the heading of the Last Failure column, click the down arrow, and then click Custom. In the Custom AutoFilter dialog box, complete the custom filter as follows:

    1. Under Last Failure, click does not equal.

    2. In the corresponding text box, type 0 to filter for only domain controllers that are experiencing failures.

For every domain controller in the forest, the spreadsheet shows the source replication partner, the time that replication last occurred, and the time that the last replication failure occurred for each naming context (directory partition). By using Autofilter in Excel, you can view the replication health for working domain controllers only, failing domain controllers only, or domain controllers that are the least or most current, and you can see the replication partners that are replicating successfully.

Attempting to Resolve Problems

Replication problems are reported in event messages and in various error messages that occur when an application or service attempts an operation. Ideally, these messages are collected by your monitoring application or when you retrieve replication status.

Most replication problems are identified in the event messages that are logged in the Directory Service event log. Replication problems might also be identified in the form of error messages in the output of the repadmin /showrepl command.

repadmin /showrepl Error Messages That Indicate Replication Problems

To identify Active Directory replication problems, use the repadmin /showrepl command as described in the previous section. The following table shows error messages that are generated by this command, along with the root causes of the errors and links to topics that provide solutions for the errors.

repadmin /showrepl Error Messages

Repadmin error Root cause Solution

The time since last replication with this server has exceeded the tombstone lifetime.

A domain controller has failed inbound replication with the named source domain controller long enough for a deletion to have been tombstoned, replicated, and garbage-collected from Active Directory.

Event ID 2042: It has been too long since this machine replicated

No inbound neighbors.

If no items appear in the “Inbound Neighbors” section of the output that is generated by repadmin /showrepl, the domain controller was not able to establish replication links with another domain controller.

Fixing Replication Connectivity Problems (Event ID 1925)

Access is denied.

A replication link exists between two domain controllers, but replication cannot be performed properly due to an authentication failure.

Fixing Replication Security Problems

Last attempt at <date - time> failed with the “Target account name is incorrect.”

This problem can be related to connectivity, DNS, or authentication issues.

If this is a DNS error, the local domain controller could not resolve the globally unique identifier (GUID)–based DNS name of its replication partner.

Fixing Replication DNS Lookup Problems (Event IDs 1925, 2087, 2088)

Fixing Replication Security Problems

Fixing Replication Connectivity Problems (Event ID 1925)

LDAP Error 49.

The domain controller computer account might not be synchronized with the Key Distribution Center (KDC).

Fixing Replication Security Problems

Cannot open LDAP connection to local host

The administration tool could not contact Active Directory.

Fixing Replication DNS Lookup Problems (Event IDs 1925, 2087, 2088)

Active Directory replication has been preempted.

The progress of inbound replication was interrupted by a higher priority replication request, such as a request generated manually with the repadmin /sync command.

Wait for replication to complete. This informational message indicates normal operation.

Replication posted, waiting.

The domain controller posted a replication request and is waiting for an answer. Replication is in progress from this source.

Wait for replication to complete. This informational message indicates normal operation.

Event Messages That Indicate Active Directory Replication Problems

The following table lists common events that might indicate problems with Active Directory replication, along with root causes of the problems and links to topics that provide solutions for the problems.

Events That Indicate Active Directory Replication Problems

Event ID and source Root cause Solution

1311 — NTDS KCC

The replication configuration information in Active Directory does not accurately reflect the physical topology of the network.

Fixing Replication Topology Problems (Event ID 1311)

1388 — NTDS Replication

Strict replication consistency is not in effect, and a lingering object has been replicated to the domain controller.

Fixing Replication Lingering Object Problems (Event IDs 1388, 1988, 2042)

1925 — NTDS KCC

The attempt to establish a replication link for a writable directory partition failed. This event can have different causes, depending on the error.

Fixing Replication Connectivity Problems (Event ID 1925)

Fixing Replication DNS Lookup Problems (Event IDs 1925, 2087, 2088)

1988 — NTDS Replication

The local domain controller has attempted to replicate an object from a source domain controller that is not present on the local domain controller because it may have been deleted and already garbage-collected. Replication will not proceed for this directory partition with this partner until the situation is resolved.

Fixing Replication Lingering Object Problems (Event IDs 1388, 1988, 2042)

2042 — NTDS Replication

Replication has not occurred with this partner for a tombstone lifetime, and replication cannot proceed.

Fixing Replication Lingering Object Problems (Event IDs 1388, 1988, 2042)

2087 — NTDS Replication

Active Directory could not resolve the DNS host name of the source domain controller to an Internet Protocol (IP) address, and replication failed.

Fixing Replication DNS Lookup Problems (Event IDs 1925, 2087, 2088)

2088 — NTDS Replication

Active Directory could not resolve the DNS host name of the source domain controller to an IP address, but replication succeeded.

Fixing Replication DNS Lookup Problems (Event IDs 1925, 2087, 2088)

2095 — NTDS Replication

Update sequence number (USN) rollback has occurred and replication has been stopped. This error indicates an improper Active Directory restore, possibly of a virtual machine file (.vhd).

For an explanation of this problem and recommendations for solutions, see Running Domain Controllers in Virtual Server 2005 on the Microsoft Web site (https://go.microsoft.com/fwlink/?LinkId=38330).

5805 — Net Logon

A machine account failed to authenticate, which is usually caused by either multiple instances of the same computer name or the computer name not replicating to every domain controller.

Fixing Replication Security Problems

For more information about replication concepts, see “Active Directory Replication Technologies” in the Windows Server 2003 Technical Reference on the Microsoft Web site (https://go.microsoft.com/fwlink/?LinkId=41950).

In this section

Fixing Replication Lingering Object Problems (Event IDs 1388, 1988, 2042)

Fixing Replication Security Problems

Fixing Replication DNS Lookup Problems (Event IDs 1925, 2087, 2088)

Fixing Replication Connectivity Problems (Event ID 1925)

Fixing Replication Topology Problems (Event ID 1311)