I have an update - I traced the device connected to that specific root port address as an NVMe SSD using a full kernel dump that resulted from a recurrence of the error.
I've reseated the SSD and I am waiting for a chkdsk to repair bad sectors.
WHEA Uncorrectable Error 124 PCI express root port
Hi there, I have been having repeated crashes of an MSI GP66 laptop (i7-10750h, RTX3070) where the screen freezes with specific pixels on the screen lit, the rest going black, and the computer reboots. The dumps are consistent in specifying a WHEA uncorrectable error (124). Temperature spikes are not seen and the crashes consistently recur when gaming ( note, NVIDIA broadcast is running in the background usually ).
WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
nt!_WHEA_ERROR_RECORD structure that describes the error condition. Try !errrec Address of the nt!_WHEA_ERROR_RECORD structure to get more details.
Arguments:
Arg1: 0000000000000004, PCI Express Error
Arg2: ffffa5837ba3a028, Address of the nt!_WHEA_ERROR_RECORD structure.
Arg3: 0000000000000000
Arg4: 0000000000000000
Debugging Details:
KEY_VALUES_STRING: 1
Key : Analysis.CPU.mSec
Value: 3671
Key : Analysis.DebugAnalysisManager
Value: Create
Key : Analysis.Elapsed.mSec
Value: 4577
Key : Analysis.Init.CPU.mSec
Value: 624
Key : Analysis.Init.Elapsed.mSec
Value: 11503
Key : Analysis.Memory.CommitPeak.Mb
Value: 109
Key : WER.OS.Branch
Value: co_release
Key : WER.OS.Timestamp
Value: 2021-06-04T16:28:00Z
Key : WER.OS.Version
Value: 10.0.22000.1
FILE_IN_CAB: MEMORY.DMP
DUMP_FILE_ATTRIBUTES: 0x1000
BUGCHECK_CODE: 124
BUGCHECK_P1: 4
BUGCHECK_P2: ffffa5837ba3a028
BUGCHECK_P3: 0
BUGCHECK_P4: 0
HARDWARE_VENDOR_ID: 2646
HARDWARE_DEVICE_ID: 500F
BLACKBOXBSD: 1 (!blackboxbsd)
BLACKBOXNTFS: 1 (!blackboxntfs)
BLACKBOXPNP: 1 (!blackboxpnp)
BLACKBOXWINLOGON: 1
PROCESS_NAME: System
STACK_TEXT:
fffffe0f0ef9d698 fffff801
23d0169b : 0000000000000124 00000000
00000004 ffffa5837ba3a028 00000000
00000000 : nt!KeBugCheckEx
fffffe0f0ef9d6a0 fffff801
229410c0 : 0000000000000000 fffffe0f
0ef9d779 ffffa5837ba3a028 ffffa583
7ba3a028 : nt!HalBugCheckSystem+0xeb
fffffe0f0ef9d6e0 fffff801
23e3fa33 : 0000000000000000 fffffe0f
0ef9d779 ffffa5837ba3a028 ffffa583
4c651774 : PSHED!PshedBugCheckSystem+0x10
fffffe0f0ef9d710 fffff801
28435dee : ffffa5834c5974b0 ffffa583
4c5974b0 ffffa5834c651310 00000000
0000002c : nt!WheaReportHwError+0x393
fffffe0f0ef9d7e0 fffff801
28435f23 : fffff80128449680 fffff801
28449680 fffff80128449698 ffffa583
49eaccb0 : pci!ExpressPcProcessWorkQueueItem+0x2ee
fffffe0f0ef9d8d0 fffff801
23b5329f : ffffa5837abe8040 ffffa583
49eacc00 fffff80100000000 ffffa583
00000000 : pci!ExpressPcWorkQueueWorkerRoutine+0x23
fffffe0f0ef9d900 fffff801
23adc845 : ffffa5837abe8040 ffff9580
6a762000 ffffa5837abe8040 5555ffff
42274268 : nt!ExpWorkerThread+0x14f
fffffe0f0ef9daf0 fffff801
23c1aa44 : ffff95806a753180 ffffa583
7abe8040 fffff80123adc7f0 aa000000
3a264247 : nt!PspSystemThreadStartup+0x55
fffffe0f0ef9db40 00000000
00000000 : fffffe0f0ef9e000 fffffe0f
0ef97000 0000000000000000 00000000
00000000 : nt!KiStartSystemThread+0x34
MODULE_NAME: GenuineIntel
IMAGE_NAME: GenuineIntel.sys
STACK_COMMAND: .cxr; .ecxr ; kb
FAILURE_BUCKET_ID: 0x124_4_GenuineIntel_PCIEXPRESS_VENID_2646_DEVID_500F_IMAGE_GenuineIntel.sys
OS_VERSION: 10.0.22000.1
BUILDLAB_STR: co_release
OSPLATFORM_TYPE: x64
OSNAME: Windows 10
FAILURE_ID_HASH: {1d5c882c-d859-d42c-7d51-ef189f140118}
Followup: MachineOwner
10: kd> !errrec ffffa5837ba3a028
Common Platform Error Record @ ffffa5837ba3a028
Record Id : 01d84912ef50b956
Severity : Fatal (1)
Length : 812
Creator : Microsoft
Notify Type : PCI Express Error
Timestamp : 4/7/2022 6:28:46 (UTC)
Flags : 0x00000000
===============================================================================
Section 0 : PCI Express
Descriptor @ ffffa5837ba3a0a8
Section @ ffffa5837ba3a180
Offset : 344
Length : 208
Flags : 0x00000001 Primary
Severity : Fatal
Port Type : Root Port
Version : 1.1
Command/Status: 0x0010/0x0406
Device Id :
VenId:DevId : 8086:06ac
Class code : 030400
Function No : 0x04
Device No : 0x1b
Segment : 0x0000
Primary Bus : 0x00
Second. Bus : 0x00
Slot : 0x0000
Dev. Serial # : 0000000000000000
Express Capability Information @ ffffa5837ba3a1b4
Device Caps : 00008001 Role-Based Error Reporting: 1
Device Ctl : 0027 ur FE NF CE
Dev Status : 0011 ur fe nf CE
Root Ctl : 0008 fs nfs cs
AER Information @ ffffa5837ba3a1f0
Uncorrectable Error Status : 00000000 ur ecrc mtlp rof uc ca cto fcp ptlp sd dlp und
Uncorrectable Error Mask : 00000000 ur ecrc mtlp rof uc ca cto fcp ptlp sd dlp und
Uncorrectable Error Severity : 00060011 ur ecrc MTLP ROF uc ca cto fcp ptlp sd DLP UND
Correctable Error Status : 00000000 adv rtto rnro dllp tlp re
Correctable Error Mask : 00000000 adv rtto rnro dllp tlp re
Caps & Control : 00000010 ecrcchken ecrcchkcap ecrcgenen ecrcgencap FEP
Header Log : 4a000001 03000004 00000000 00000000
Root Error Command : 00000000 fen nfen cen
Root Error Status : 00000000 MSG# 00 fer nfer fuf mur ur mcr cer
Correctable Error Source ID : 00,00,00
Correctable Error Source ID : 00,00,00
1 answer
Sort by: Most helpful
-
Ankur Chakravarthy 1 Reputation point
2022-04-09T09:00:55.773+00:00