Understanding Urgent Replication

Article
07/05/2008

I've found that a lot of people tend to be confused about the topic of urgent replication and what exactly its supposed to do. One common misconception is that urgent replication ensures that a change is converged through the entire domain immediately. This isn't necessarily true. Let's look at this in more detail...

Change Notifications

So how does a DC know that it needs to replicate? Replication is a fairly complex process that I may talk about more in the future, but for now you just need to understand a couple of key concepts. When changes are made, the DC will notify it's replication partners in the same site that it has a change for them. This is called change notification.

Changes Between Sites

Changes between DCs in different sites don't use change notifications. Instead, DCs request changes on the interval defined in the site link. But the lowest value we can have for inter-site replication in the site link is 15 minutes. What if we want to converge sooner? Take the below architecture for example.

Here we have 4 sites, one connected to another. With a replication interval of 15 minutes in each site link, convergence of a change from site A to site D can take as long as 45 minutes (A to B = 15 min + B to C = 30 min + C to D = 45 min). There is a way for us to lower convergence time. It's called inter-site change notifications and it's not turned on by default. Inter-site change notifications work just like regular change notifications except it traverses sites. Since the replication partner across the site is notified of changes, the intersite replication interval is effectively ignored. The originating DC will notify the DC in the other site that it has a change just like it does within a site. When you implement this setting, you need to have good links between these sites. I would recommend that any latency less than 500ms is good enough for inter-site change notifications. Most modern environments that I have seen can turn on inter-site change notifications without any issues, but they are either unaware of the setting or they think their links need to be better.

Notification Delay

So with change notification turned on in our sample design, we should have a pretty quick convergence time. But, it's still not instantaneous. But why not? There are a couple factors that would cause replication delays that I didn't mention earlier in our example. When change notifications are used, the DC's partners aren't notified immediately when a change is made. In Windows 2003, there is a 15 second delay before the first replication partner is notified that there is a change (in Windows 2000, the first delay is 300 seconds). After the first delay, the notification delay to other partners decreases to 3 seconds (in Windows 2000, subsequent delays are 30 seconds). This is true for all change notifications. So lets take another look at our sample design. Except this time, let's examine the replication partners

Site Link	Replication Partner 1	Replication Partner 2	Replication Interval
Site A - Site B	DC2	DC3	0
Site B - Site C	DC4	DC5	0
Site C - Site D	DC6	DC7	0

So what's going to happen when an originating write occurs on DC1? Intersite change notifications are turned on now, so the replication interval is effectively 0. But since DC2 is the bridgehead to Site B, DC2 needs the change before it can replicate it. So the change is made on DC1. 15 seconds pass and DC2 is notified of the change. DC2 initiates replication and gets the change. Now what? Now DC2 waits another 15 seconds before notifying it's first replication partner that it has a change to replicate. So another 15 seconds pass before DC3 is notified. Then DC3 waits 15 seconds to notify DC4. And so on ... So by the time DC8 receives the change, approximately 105 seconds have passed, even though intersite change notifications are turned on.

Urgent Replication

Still, a convergence time of under 2 minutes is pretty good. But what about in the scenario of an account lockout? It's entirely possible that a user could hit a Domain Controller that the lockout hasn't replicated to yet. This is what urgent replication helps with. Urgent replication bypasses the notification delay and processes the change notifications immediately. This only affects change notifications. If you don't have change notifications on between sites, replication still honors the replication interval on the site link. So in the case of an account lockout, it's possible to have an almost instantaneous replication to DC8.

Password Changes

This is just one scenario that illustrates urgent replication. Password changes sort of break the rules. When a password is changed, there is an immediate replication to the PDC Emulator. This is different than urgent replication because it occurs immediately without any regard to the inter-site replication interval.

There is a reason why the password change is immediately replicated to the PDC Emulator. If a user changes their password and then immediately logs on against another DC in a different site, the logon would probably fail because the other DC wouldn't yet have the change. AD takes this scenario into account. When there is an invalid password, the DC passes the authentication back to the PDC Emulator because it's going to have a copy of the latest password. If the PDC Emulator authenticates him successfully then the logon is processed. This happens behind the scenes and does not increment the bad password count attribute.

Urgent replication is different than immediate replication and on-demand replication, so be careful not to confuse them. The key takeaway here is that urgent replication does not guarantee immediate convergence. Urgent replication only impacts the delay in change notifications.

Share via