Push Locks – What are they?

Pushlocks were a new locking primitive first introduced in Windows Server 2003 and are primarily used in place of spinlocks to protect key kernel data structures. Unfortunately, Pushlocks are not documented in the WDK, and are not available for public use; however, a few internal drivers do use them, so you might see them while debugging a machine. Also, I was digging around on MSDN for Pushlocks, and I found that the Filter Manager does expose certain APIs to use Pushlocks, so you are in luck if you are developing a filter driver!

Gate objects

Pushlocks are built on primitive gate objects, which are defined by KGATE structures. The gate object is a highly optimized version of the basic event object. By not having both the notification and synchronization versions of the basic event object, and by being the exclusive object that a thread can wait on, the code for acquiring and releasing a gate is heavily optimized. Gates even have their own dispatcher lock instead of acquiring the entire dispatcher database.

Unlike spinlocks, which must be acquired exclusively for all operations on a data structure, pushlocks can be shared by multiple “readers” and need only be acquired exclusively when a thread needs to modify the data structure.

Operation

When a thread acquires a normal pushlock, the pushlock code marks the pushlock as owned, if it is not owned already. If someone owns the pushlock exclusively, or the thread wants to own the pushlock exclusively while someone else has it in shared mode, the thread allocates a wait block on its stack, initializes a gate object in the wait block, and then add the wait block to the wait list associated with the pushlock. When the thread holding the pushlock finally releases it, it wakes the next waiter by signaling the event in the waiters wait block.

When debugging a machine, there is no easy way to figure out the current owner of the pushlock, apart from doing code review. By looking at the waitlist, you can always figure out the threads trying to get access to it, but since the gate does not keep track of the owner like a regular mutex, it is much harder to find the current owner.

For more details on the operation and structure of a pushlock, please review the Pushlocks section in Windows Internals book, under the System Mechanisms Chapter.

Advantages of using a PushLock

If a pushlock is held by one or more readers, threads that want to modify the data structure are queued for exclusive access. This queuing mechanism provides some of the same benefits of queued spinlocks—for example, FIFO ordering, elimination of race conditions, and reduced cache thrashing when more than one thread is waiting for the pushlock.

Another advantage of using a pushlock is the size. A regular resource object is 56 bytes, however a pushlock is the size of a pointer. Apart from a small memory footprint, this helps especially in the non-contended case, since pushlocks do not require lengthy operations to perform acquisition or release. Because the pushlock fits in one “machine word”, the CPU can use atomic operations to compare and exchange the old lock with the new one.

Pushlocks are also self-optimizing in the sense that the list of threads waiting on a pushlock will be periodically rearranged to provide fairer behavior when the pushlock is released.

Cache Aware Pushlocks

A cache-aware pushlock adds to the basic pushlock by allocating a normal pushlock for each processor in the system and associating it with the cache-aware pushlock. When a thread wants to acquire a cache-aware pushlock for shared access, it simply acquires the pushlock on that processor; however if it needs to acquire the lock for exclusive access, it has to acquire the pushlocks for each processor in exclusive mode.

What does a Pushlock look like?

3: kd> !thread 8c9764c0

THREAD 8c9764c0 Cid 2410.1be4 Teb: 7ff9f000 Win32Thread: e5c6f298 GATEWAIT

Stack Init b386b000 Current b386a978 Base b386b000 Limit b3867000 Call 0

ChildEBP RetAddr Args to Child

b386a990 80833485 8c9764c0 8c9764e4 00000003 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])

b386a9bc 8082ffe0 b06a6a03 e11e0b18 b386aa54 nt!KiSwapThread+0x2e5 (FPO: [Non-Fpo]) (CONV: fastcall)

b386a9e4 8087d722 00000000 e11e0b08 e11e0b18 nt!KeWaitForGate+0x152 (FPO: [Non-Fpo]) (CONV: fastcall)

e11e0b18 00000000 0c050204 7346744e e37b2808 nt!ExfAcquirePushLockExclusive+0x112 (FPO: [Non-Fpo]) (CONV: fastcall)

Above is a snipped output from a dump that I was recently looking at. From the stack, you can see the ExfAcquirePushLockExclusive call trying to acquire the pushlock, which then calls KEWaitForGate. In this case, the lock was already acquired, so this thread allocated a wait block on its stack, and then added itself to the waitlist.

Also, the stack is broken due to the fastcall, therefore the debugger cannot display it entirely. So we can manually reconstruct the stack by passing parameters to the kb command.

k[b|p|P|v] = BasePtr StackPtr InstructionPtr

To get the arguments, we first dump the stack manually using the dps command with the current esp.

3: kd> dps b386a978 l50

b386a978 b386ad40

b386a97c 00000000

b386a980 8088dafe nt!KiSwapContext+0x26

b386a984 b386a9bc

b386a988 b386aa00

b386a98c f773f120

b386a990 8c9764c0

b386a994 80833485 nt!KiSwapThread+0x2e5

b386a998 8c9764c0

b386a99c 8c9764e4

b386a9a0 00000003

b386a9a4 8c9764c0

b386a9a8 00000003

b386a9ac 00000002

b386a9b0 00000002

b386a9b4 f773fa7c

b386a9b8 008c0030

b386a9bc b386a9e4

b386a9c0 8082ffe0 nt!KeWaitForGate+0x152

b386a9c4 b06a6a03

b386a9c8 e11e0b18

b386a9cc b386aa54

b386a9d0 00000000

b386a9d4 8c976504

b386a9d8 00000000

b386a9dc 0000001c

b386a9e0 00000000

b386a9e4 b386aa40

b386a9e8 8087d722 nt!ExfAcquirePushLockExclusive+0x112

b386a9ec 00000000

b386a9f0 e11e0b08

b386a9f4 e11e0b18

b386a9f8 b386aa40

b386a9fc 8096e9a9 nt!SeOpenObjectAuditAlarm+0x1cf

b386aa00 00040007

b386aa04 00000000

b386aa08 8c976568

b386aa0c 8c976568

b386aa10 b06a6a00

b386aa14 b4ee0a00

b386aa18 b127cc10

b386aa1c 00000000

b386aa20 00000001

b386aa24 80a60456 hal!KfLowerIrql+0x62

b386aa28 b386ac04

b386aa2c 8d117800

b386aa30 00000000

b386aa34 00000000

b386aa38 b386aa20

b386aa3c 01943080

b386aa40 b386aa64

b386aa44 808b7a14 nt!CmpCheckRecursionAndRecordThreadInfo+0x2a

From the output above, we can see the stack. To reconstruct the stack, we can get the ebp, esp, and eip from the stack for the ExfAcquirePushLockExclusive frame, and pass it to the kb command. Voila!

3: kd> kb = b386aa40 b386a9e4 8087d722

ChildEBP RetAddr Args to Child

b386aa40 808b7a14 b386ac04 e11e0b18 e11e0b18 nt!ExfAcquirePushLockExclusive+0x112

b386aa64 808b7b09 e11e0b18 b386aa80 e101bf40 nt!CmpCheckRecursionAndRecordThreadInfo+0x2a

b386aaa4 808da118 0000001c b386ab58 00000001 nt!CmpCallCallBacks+0x6b

b386ab90 80937942 e101bf40 00000000 89f13648 nt!CmpParseKey+0xd4

b386ac10 80933a76 00000000 b386ac50 00000040 nt!ObpLookupObjectName+0x5b0

b386ac64 808bb471 00000000 8e930480 00000d01 nt!ObOpenObjectByName+0xea

b386ad50 808897bc 0243eba0 00020019 0243eb68 nt!NtOpenKey+0x1ad

b386ad50 7c8285ec 0243eba0 00020019 0243eb68 nt!KiFastCallEntry+0xfc

WARNING: Frame IP not in any known module. Following frames may be wrong.

0243eba4 00000000 00000000 00000000 00000000 0x7c8285ec