WHEA Uncorrectable Error 124 PCI express root port

Ankur Chakravarthy 1 Reputation point
2022-04-08T09:10:02.377+00:00

Hi there, I have been having repeated crashes of an MSI GP66 laptop (i7-10750h, RTX3070) where the screen freezes with specific pixels on the screen lit, the rest going black, and the computer reboots. The dumps are consistent in specifying a WHEA uncorrectable error (124). Temperature spikes are not seen and the crashes consistently recur when gaming ( note, NVIDIA broadcast is running in the background usually ).

WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
nt!_WHEA_ERROR_RECORD structure that describes the error condition. Try !errrec Address of the nt!_WHEA_ERROR_RECORD structure to get more details.
Arguments:
Arg1: 0000000000000004, PCI Express Error
Arg2: ffffa5837ba3a028, Address of the nt!_WHEA_ERROR_RECORD structure.
Arg3: 0000000000000000
Arg4: 0000000000000000

Debugging Details:


KEY_VALUES_STRING: 1

Key  : Analysis.CPU.mSec
Value: 3671

Key  : Analysis.DebugAnalysisManager
Value: Create

Key  : Analysis.Elapsed.mSec
Value: 4577

Key  : Analysis.Init.CPU.mSec
Value: 624

Key  : Analysis.Init.Elapsed.mSec
Value: 11503

Key  : Analysis.Memory.CommitPeak.Mb
Value: 109

Key  : WER.OS.Branch
Value: co_release

Key  : WER.OS.Timestamp
Value: 2021-06-04T16:28:00Z

Key  : WER.OS.Version
Value: 10.0.22000.1

FILE_IN_CAB: MEMORY.DMP

DUMP_FILE_ATTRIBUTES: 0x1000

BUGCHECK_CODE: 124

BUGCHECK_P1: 4

BUGCHECK_P2: ffffa5837ba3a028

BUGCHECK_P3: 0

BUGCHECK_P4: 0

HARDWARE_VENDOR_ID: 2646

HARDWARE_DEVICE_ID: 500F

BLACKBOXBSD: 1 (!blackboxbsd)

BLACKBOXNTFS: 1 (!blackboxntfs)

BLACKBOXPNP: 1 (!blackboxpnp)

BLACKBOXWINLOGON: 1

PROCESS_NAME: System

STACK_TEXT:
fffffe0f0ef9d698 fffff80123d0169b : 0000000000000124 0000000000000004 ffffa5837ba3a028 0000000000000000 : nt!KeBugCheckEx
fffffe0f0ef9d6a0 fffff801229410c0 : 0000000000000000 fffffe0f0ef9d779 ffffa5837ba3a028 ffffa5837ba3a028 : nt!HalBugCheckSystem+0xeb
fffffe0f0ef9d6e0 fffff80123e3fa33 : 0000000000000000 fffffe0f0ef9d779 ffffa5837ba3a028 ffffa5834c651774 : PSHED!PshedBugCheckSystem+0x10
fffffe0f0ef9d710 fffff80128435dee : ffffa5834c5974b0 ffffa5834c5974b0 ffffa5834c651310 000000000000002c : nt!WheaReportHwError+0x393
fffffe0f0ef9d7e0 fffff80128435f23 : fffff80128449680 fffff80128449680 fffff80128449698 ffffa58349eaccb0 : pci!ExpressPcProcessWorkQueueItem+0x2ee
fffffe0f0ef9d8d0 fffff80123b5329f : ffffa5837abe8040 ffffa58349eacc00 fffff80100000000 ffffa58300000000 : pci!ExpressPcWorkQueueWorkerRoutine+0x23
fffffe0f0ef9d900 fffff80123adc845 : ffffa5837abe8040 ffff95806a762000 ffffa5837abe8040 5555ffff42274268 : nt!ExpWorkerThread+0x14f
fffffe0f0ef9daf0 fffff80123c1aa44 : ffff95806a753180 ffffa5837abe8040 fffff80123adc7f0 aa0000003a264247 : nt!PspSystemThreadStartup+0x55
fffffe0f0ef9db40 0000000000000000 : fffffe0f0ef9e000 fffffe0f0ef97000 0000000000000000 0000000000000000 : nt!KiStartSystemThread+0x34

MODULE_NAME: GenuineIntel

IMAGE_NAME: GenuineIntel.sys

STACK_COMMAND: .cxr; .ecxr ; kb

FAILURE_BUCKET_ID: 0x124_4_GenuineIntel_PCIEXPRESS_VENID_2646_DEVID_500F_IMAGE_GenuineIntel.sys

OS_VERSION: 10.0.22000.1

BUILDLAB_STR: co_release

OSPLATFORM_TYPE: x64

OSNAME: Windows 10

FAILURE_ID_HASH: {1d5c882c-d859-d42c-7d51-ef189f140118}

Followup: MachineOwner


10: kd> !errrec ffffa5837ba3a028

Common Platform Error Record @ ffffa5837ba3a028

Record Id : 01d84912ef50b956
Severity : Fatal (1)
Length : 812
Creator : Microsoft
Notify Type : PCI Express Error
Timestamp : 4/7/2022 6:28:46 (UTC)
Flags : 0x00000000

===============================================================================

Section 0 : PCI Express

Descriptor @ ffffa5837ba3a0a8
Section @ ffffa5837ba3a180
Offset : 344
Length : 208
Flags : 0x00000001 Primary
Severity : Fatal

Port Type : Root Port
Version : 1.1
Command/Status: 0x0010/0x0406
Device Id :
VenId:DevId : 8086:06ac
Class code : 030400
Function No : 0x04
Device No : 0x1b
Segment : 0x0000
Primary Bus : 0x00
Second. Bus : 0x00
Slot : 0x0000
Dev. Serial # : 0000000000000000
Express Capability Information @ ffffa5837ba3a1b4
Device Caps : 00008001 Role-Based Error Reporting: 1
Device Ctl : 0027 ur FE NF CE
Dev Status : 0011 ur fe nf CE
Root Ctl : 0008 fs nfs cs

AER Information @ ffffa5837ba3a1f0
Uncorrectable Error Status : 00000000 ur ecrc mtlp rof uc ca cto fcp ptlp sd dlp und
Uncorrectable Error Mask : 00000000 ur ecrc mtlp rof uc ca cto fcp ptlp sd dlp und
Uncorrectable Error Severity : 00060011 ur ecrc MTLP ROF uc ca cto fcp ptlp sd DLP UND
Correctable Error Status : 00000000 adv rtto rnro dllp tlp re
Correctable Error Mask : 00000000 adv rtto rnro dllp tlp re
Caps & Control : 00000010 ecrcchken ecrcchkcap ecrcgenen ecrcgencap FEP
Header Log : 4a000001 03000004 00000000 00000000
Root Error Command : 00000000 fen nfen cen
Root Error Status : 00000000 MSG# 00 fer nfer fuf mur ur mcr cer
Correctable Error Source ID : 00,00,00
Correctable Error Source ID : 00,00,00

Windows 11
Windows 11
A Microsoft operating system designed for productivity, creativity, and ease of use.
4,753 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Ankur Chakravarthy 1 Reputation point
    2022-04-09T09:00:55.773+00:00

    I have an update - I traced the device connected to that specific root port address as an NVMe SSD using a full kernel dump that resulted from a recurrence of the error.
    I've reseated the SSD and I am waiting for a chkdsk to repair bad sectors.