Posting a full root-cause analysis in case it helps triage, and to ask whether this can be escalated to the kernel / pool-allocator team. Offsets from ntkrnlmp.exe 10.0.26100.8457; still reproduces on 26100.8655 after KB5094135 (installed 2026-06-13). Failure hash 6f13343d-8edf-14f9-0269-6df067c74f57, faulting IP nt!ExpPoolTrackerChargeEntry+0x40 (P2 low bits ...96570, invariant across ASLR bases).
Summary: use-after-free in the pool tag tracker. nt!ExAllocateHeapPool has a lock-free fast path that locates a _POOL_TRACKER_TABLE entry from a cached PoolTrackTable base and mask, then calls nt!ExpPoolTrackerChargeEntry to atomically update its accounting fields. The tracker table is dynamically expandable: nt!ExpInsertPoolTrackerExpansion publishes the new base/size under ExpTaggedPoolLock, releases the lock, and then frees the old table with nothing waiting for in-flight lock-free readers to drain. A fast-path charge that snapshotted the old base before an expansion computes an entry pointer into the freed/recycled old table and executes lock xadd against it. If that page has been reused, it bugchecks.
The fault (052826-17906-01.dmp, .cxr at the exception context):
nt!ExpPoolTrackerChargeEntry+0x40:
fffff805`bf996570 f04b0fc12c06 lock xadd qword ptr [r14+r8],rbp ds:002b:ffffa381`b4ba31e8=fffffffffdff47d0
rbp=0000000000000540 r8=ffffa381b4ba31e0 r13=0000000020707249 ("Irp ") r14=0000000000000008
r8 is a _POOL_TRACKER_TABLE entry pointer; r14=8 selects NonPagedBytes; rbp=0x540 is the allocation size being added. [r8+8] holds fffffffffdff47d0, not a plausible byte total. Bugcheck SYSTEM_SERVICE_EXCEPTION (3B), exception c0000005. Call chain is a routine IRP allocation: ExpPoolTrackerChargeEntry <- ExAllocateHeapPool <- ExpAllocatePoolWithTagFromNode <- ExAllocatePool2 <- IopAllocateIrpPrivate (tag "Irp ") <- a NtQueryInformationProcess / file-name query.
Cross-dump fingerprint (the use-after-free tell). Same instruction and hash; r8 is invalid a different way each crash, which is what freed-and-reused memory looks like:
| Dump |
Process |
Stop |
r8 |
what is at r8 now |
| 052826-17906-01 |
logioptionsplus |
0x3B |
ffffa381b4ba31e0 |
freed/reused, [r8+8]=fffffffffdff47d0 |
| 052026-15875-01 |
Dropbox |
0x1E |
0072007400730069 |
not a pointer; r8/r9/r10 = UTF-16 "istr","y\ma","chin", a ...Registry\Machine... path string in that page |
| 052726-15765-01 |
svchost |
0x23 |
ffff80805d5b6330 |
freed/reused, [r8+8]=ffffffffffff9e20 |
| 052826-18250-01 |
powershell |
0x135 |
via CmpCallbackFatalFilter |
same root function |
Four processes, four stop codes, one root function. The stop code is just whichever exception handler catches the fault on the way up. The Dropbox dump is decisive: the tracker-entry pointer points into a page that has been recycled into a registry path string.
Where the bad pointer is produced (nt!ExAllocateHeapPool). The slow path reads the table base fresh under ExpTaggedPoolLock; the fast path uses a cached snapshot and takes no lock:
mov rdx, qword ptr [rsp+38h] ; cached PoolTrackTable base, NO lock
and eax, r12d ; index &= cached PoolTrackTableMask, NO lock
lea rbx, [rax+rax*4] ; rbx = index*5
shl rbx, 4 ; rbx = index*0x50 (sizeof _POOL_TRACKER_TABLE)
add rbx, rdx ; entry = base + index*0x50
mov r8, rbx
call nt!ExpPoolTrackerChargeEntry
Where the old table is freed (nt!ExpInsertPoolTrackerExpansion):
call nt!ExAllocateHeapPages ; new larger table -> rdi
... memcpy(old -> new) ; memset(tail) ...
mov [nt!PoolTrackTableExpansion], rdi ; +bfcde445 publish new base, lock held
mov [nt!PoolTrackTableExpansionSize], r12 ; +bfcde452 publish new size, lock held
call nt!KeReleaseInStackQueuedSpinLock ; +bfcde46b release lock
call nt!ExPoolCleanupExpansionTable(old, oldSize) ; +bfcde47b FREE old table, AFTER unlock
The publish is correctly serialized under ExpTaggedPoolLock. The defect is that reclamation happens after the lock is released, unprotected against in-flight lock-free readers.
The race. CPU A (charge) snapshots base+mask with no lock. CPU B grows the table, publishes the new base/mask under the lock, releases, then frees the old table. CPU A computes old_base + index*0x50 and executes lock xadd/lock inc into the freed table.
Why heavy load is required. A continuous ETW kernel trace shows every crash preceded by a ~3 second burst of ~700,000 kernel events/sec (normal heavy load is 10k-50k/sec), ~42% registry and ~42% file, from a process-creation storm across 24 logical CPUs. That forces tracker-table growth while charges are in flight, and explains the very low public report rate.
Fix direction. The publish is already serialized; the problem is freeing the retired table immediately after unlock. Most localized fix: defer the free until lock-free readers drain (epoch/generation or an RCU-style barrier) instead of calling ExPoolCleanupExpansionTable right after release. Alternatives: bump a generation counter under the lock on each publish and recheck it on the charge path before the atomic write; or carry the index and recompute the entry from the current base at charge time.
This is not a single bad unit. Multiple independent users now report the identical hash and faulting offset on the Intel Arrow Lake-HX platform (Core Ultra 9 275HX), across different OEMs and GPUs: Alienware 18, Alienware 16x, and Acer Predator. WER bucket telemetry alone may not surface something this rare; correlated multi-user reports with the same hash through a kernel engineer would. Full kernel dumps, 14 deduplicated minidumps (all same hash), and the ETW captures are available on request.
Caveat: this is reverse-engineered from optimized release ntoskrnl.exe with public symbols and post-mortem dumps. The fault context, the two-path disassembly, the divergent-r8 fingerprint, and the burst correlation are direct evidence; the exact free/publish ordering in the expansion routine is the one inference the team can confirm internally.