How Does KeMemoryBarrier Work?
KeMemoryBarrier is a kernel DDK support macro. There is also a WIN32 macro called MemoryBarrier that is implemented identically (there is an observance test hidden here!) - so we will just talk about KeMemoryBarrier here, but everything we say about it also applies to MemoryBarrier.
If you read the doc for KeMemoryBarrier, it says this:
"The KeMemoryBarrier routine creates a barrier at its position in the code—across which the compiler and the processor cannot move any operations."
Sounds good! However, I like to see where the bodies are buried so let's just look and see exactly how KeMemoryBarrier does this. Here is its definition for the x86 platform - from the public wdk header file wdm.h:
FORCEINLINE
VOID
KeMemoryBarrier (
VOID
)
{
LONG Barrier;
__asm {
xchg Barrier, eax
}
}
Hmm - this looks a little worrying. I don't immediately see anything that will block either compiler reordering or processor reordering! What gives? Well - first off we need to look at other constructs that block compiler and processor reordering and see if we can infer something from them. KeMemoryBarrierWithoutFence is documented to block compiler reordering. It is defined as:
#define KeMemoryBarrierWithoutFence() _ReadWriteBarrier()
Ok - that doesn't help much since we don't know what _ReadWriteBarrier does. So let's look at it's definition:
VOID
_ReadWriteBarrier(
VOID
);
#pragma intrinsic(_ReadWriteBarrier)
Aha! _ReadWriteBarrier is an intrinsic - is that why it blocks compiler reordering? The "Compiler Intrinsics" topic on MSDN includes this:
"Some intrinsics, such as __assume and __ReadWriteBarrier, provide information to the compiler, which affects the behavior of the optimizer."
Great! But wait - although this is interesting knowledge - it helps us exactly zero in our quest to dissect KeMemoryBarrier as KeMemoryBarrier is not intrinsic or a re-#define of an intrinsic. So what next? Well - maybe the fact that it has inline assembler means something? Let's go ask MSDN:
"The presence of an __asm block in a function affects optimization in several ways. First, the compiler doesn't try to optimize the __asm block itself."
Ok - still a little ambiguous - but it is looking like the fact that it is inline assembler is the reason that the compiler won't reorder around it - and in fact that is what is happening here. But that still leaves us with the looming question of how KeMemoryBarrier prevents processor reordering. KeMemoryBarrier only has one instruction in it:
xchg Barrier, eax
Is that enough to prevent processor reordering? Well - normally a locked operation or a serializing instruction on the processor is required to prevent processor reordering, but this isn't a locked operation or a serializating instruction - or is it? Let's go to the processor manuals. The Intel IA-32 instruction set manual has this to say:
If a memory operand is referenced, the processor’s locking protocol is automatically implemented for the duration of the exchange operation, regardless of the presence or absence of the LOCK prefix or of the value of the IOPL. (See the LOCK prefix description in this chapter for more information on the locking protocol.) This instruction is useful for implementing semaphores or similar data structures for process synchronization. (See “Bus Locking” in Chapter 7 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for more information on bus locking.)
So that is the answer to the other half of the mystery. Although the lock prefix is not present in the code - it is implicit in the instruction itself because we are referencing memory. In my opinion it should be explicitly called out in the code as well for clarity but that is just me.
Another point worth noting is that the compiler defined processor barrier mnemonics (i.e. __mf, __mb) guarantee to follow the semantics defined in the manuals for that processor and also prevent compiler reordering. That explains why the definition of KeMemoryBarrier on the IA64 platform is simply:
#define KeMemoryBarrier() __mf()
So what have we learned? Well we have learned that KeMemoryBarrier (and MemoryBarrier) does exactly what it says it does - it prevents both compiler and processor reordering. We have also learned that at first glance (and even second glance for that matter) it doesn't appear to do either!
I hope that this topic has helped at least one person in some way shape or form. Thanks for reading.
Anonymous
March 16, 2008
Thank you, for a very informative bit of info ;)Anonymous
March 17, 2008
"Although the lock prefix is not present in the code - it is implicit in the instruction itself because we are referencing memory. In my opinion it should be explicitly called out in the code as well for clarity but that is just me." I think you mean your opinion is that it should be called out in comments. If the comment here would say "See page %n% of the Intel IA-32 instruction set manual to see why this works" then I agree. If the comment would say something else then it's a tough question. If comments are subtly incorrect but readers believe them, then bugs don't get fixed. On the other hand, if the manual is subtly incorrect then the same problem occurs. In 1980 an Intel employee told me why LOCK didn't always do a lock and how to work around it, but the manual didn't say.Anonymous
March 17, 2008
Actually I am saying both. The code should be: lock xchg Barrier, eax And the comment should explain that the lock prefix forces a full barrier on all processors <X> and forward. Then it is clear to a reader what is going on. I have spent hours diggin though stuff like this just to find out that someone else had already done the research - but didn't bother to put it in the comments.Anonymous
March 18, 2008
The comment has been removed