WHEA_UNCORRECTABLE_ERROR - AuthenticAMD.sys

Joel-mic 0 Reputation points
2023-04-05T19:13:12.4566667+00:00

I'm trying to get a "render farm" going using Nvidia GPUs and Octane Render. On Windows Server 2019, I get BSOD and reboots while running render benchmarks (Octane Bench). It can usually make it a few seconds into the test, further if I power limit the GPUs but it's inconsistent. Enabling only 1/4 or 2/4 and sometimes 3/4 GPUs can allow me to complete the benchmark. But 4/4 crashes the machine. AMD Epyc 7232P processor Asrock ROMED8-2T/BCM motherboard 4x RTX 3090 Founders Edition GPUs I don't have much knowledge to make sense of the minidump file, but here's the text of one:


Microsoft (R) Windows Debugger Version 10.0.25200.1003 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [D:\Polymath Dropbox\Joel Gautraud\Adobe After Effects Auto-Save\040523-15343-01.dmp]
Mini Kernel Dump File: Only registers and stack trace are available

Symbol search path is: srv*
Executable search path is: 
Windows 10 Kernel Version 17763 MP (16 procs) Free x64
Product: Server, suite: TerminalServer SingleUserTS
Machine Name:
Kernel base = 0xfffff801`4e60f000 PsLoadedModuleList = 0xfffff801`4ea274d0
Debug session time: Wed Apr  5 12:56:40.820 2023 (UTC - 4:00)
System Uptime: 0 days 0:00:04.829
Loading Kernel Symbols
..............................................................
Loading User Symbols
Mini Kernel Dump does not contain unloaded driver list
For analysis of this file, run !analyze -v
nt!WheapCreateLiveTriageDump+0x7b:
fffff801`4eee9d77 48895c2438      mov     qword ptr [rsp+38h],rbx ss:0018:ffffdf8e`98acb5d8=ffff920700000a60
12: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
nt!_WHEA_ERROR_RECORD structure that describes the error condition. Try !errrec Address of the nt!_WHEA_ERROR_RECORD structure to get more details.
Arguments:
Arg1: 0000000000000007, BOOT Error
Arg2: ffff92076e513d68, Address of the nt!_WHEA_ERROR_RECORD structure.
Arg3: 0000000000000000
Arg4: 0000000000000000

Debugging Details:
------------------


KEY_VALUES_STRING: 1

    Key  : Analysis.CPU.mSec
    Value: 2108

    Key  : Analysis.DebugAnalysisManager
    Value: Create

    Key  : Analysis.Elapsed.mSec
    Value: 3250

    Key  : Analysis.IO.Other.Mb
    Value: 10

    Key  : Analysis.IO.Read.Mb
    Value: 0

    Key  : Analysis.IO.Write.Mb
    Value: 17

    Key  : Analysis.Init.CPU.mSec
    Value: 1358

    Key  : Analysis.Init.Elapsed.mSec
    Value: 41515

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 77

    Key  : Bugcheck.Code.DumpHeader
    Value: 0x124

    Key  : Bugcheck.Code.Register
    Value: 0x98acb5e0


FILE_IN_CAB:  040523-15343-01.dmp

BUGCHECK_CODE:  124

BUGCHECK_P1: 7

BUGCHECK_P2: ffff92076e513d68

BUGCHECK_P3: 0

BUGCHECK_P4: 0

CUSTOMER_CRASH_COUNT:  1

PROCESS_NAME:  System

STACK_TEXT:  
ffffdf8e`98acb5a0 fffff801`4eb8c089     : ffff9207`6ad23040 ffff9207`6e513d40 fffff801`4ea13580 00000000`00000000 : nt!WheapCreateLiveTriageDump+0x7b
ffffdf8e`98acbad0 fffff801`4e92d168     : ffff9207`6e513d40 fffff801`4e725711 00000000`00000000 00000000`00000000 : nt!WheapCreateTriageDumpFromPreviousSession+0x2d
ffffdf8e`98acbb00 fffff801`4e92de7b     : fffff801`4ea13520 fffff801`4ea13580 fffff801`4ea175e0 ffff9207`67afa960 : nt!WheapProcessWorkQueueItem+0x48
ffffdf8e`98acbb40 fffff801`4e6c01ba     : ffff9207`67cc0920 ffff9207`6ad23040 ffff9207`67cc0900 ffff9207`00000000 : nt!WheapWorkQueueWorkerRoutine+0x2b
ffffdf8e`98acbb70 fffff801`4e741ed5     : ffff9207`6ad23040 ffff9207`67bf5300 ffff9207`6ad23040 00000000`00000000 : nt!ExpWorkerThread+0x16a
ffffdf8e`98acbc10 fffff801`4e7d051c     : fffff801`4d0f1180 ffff9207`6ad23040 fffff801`4e741e80 00000000`00000000 : nt!PspSystemThreadStartup+0x55
ffffdf8e`98acbc60 00000000`00000000     : ffffdf8e`98acc000 ffffdf8e`98ac6000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x1c


MODULE_NAME: AuthenticAMD

IMAGE_NAME:  AuthenticAMD.sys

STACK_COMMAND:  .cxr; .ecxr ; kb

FAILURE_BUCKET_ID:  0x124_7_AuthenticAMD_PROCESSOR__UNKNOWN_IMAGE_AuthenticAMD.sys

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

FAILURE_ID_HASH:  {9a3989b5-afe5-d9f8-5fed-f06a563b7314}

Followup:     MachineOwner
---------

12: kd> lmvm AuthenticAMD
Browse full module list
start             end                 module name
Mini Kernel Dump does not contain unloaded driver list
Windows Server
Windows Server
A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.
13,133 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Limitless Technology 44,346 Reputation points
    2023-04-06T12:35:29.2266667+00:00

    Hello there, Do you get any Event ID so we can narrow down things? In the bios under the Advanced tab there's an option for Precision Boost Overdrive. Try disabling it . WHEA_UNCORRECTABLE_ERROR. AuthenticAMD. sys Bsod or Blue Screen error is triggered due to outdated BIOS or AMD Graphic driver, enabled Fast Startup, or hardware failure. The WHEA_UNCORRECTABLE_ERROR bug check has a value of 0x00000124. This bug check indicates that a fatal hardware error has occurred. This bug check uses the error data that is provided by the Windows Hardware Error Architecture (WHEA). Hope this resolves your Query !! --If the reply is helpful, please Upvote and Accept it as an answer--

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.