Windows Server 2019 authenticamd.sys WHEA_UNCORRECTABLE_ERROR

Joel-mic 0 Reputation points
2023-04-05T19:51:23.2466667+00:00

I'm trying to get a "render farm" going using Nvidia GPUs and Octane Render. On Windows Server 2019, I get BSOD and reboots while running render benchmarks (Octane Bench). It can usually make it a few seconds into the test, further if I power limit the GPUs but it's inconsistent. Enabling only 1/4 or 2/4 and sometimes 3/4 GPUs can allow me to complete the benchmark. But 4/4 crashes the machine.

  • AMD Epyc 7232P processor
  • Asrock ROMED8-2T/BCM motherboard
  • 4x RTX 3090 Founders Edition GPUs

I don't have much knowledge to make sense of the minidump file, but here's the text of one:

Microsoft (R) Windows Debugger Version 10.0.25200.1003 AMD64 Copyright (c) Microsoft Corporation. All rights reserved. Loading Dump File [D:\Polymath Dropbox\Joel Gautraud\Adobe After Effects Auto-Save\040523-15343-01.dmp] Mini Kernel Dump File: Only registers and stack trace are available Symbol search path is: srv* Executable search path is: Windows 10 Kernel Version 17763 MP (16 procs) Free x64 Product: Server, suite: TerminalServer SingleUserTS Machine Name: Kernel base = 0xfffff8014e60f000 PsLoadedModuleList = 0xfffff8014ea274d0 Debug session time: Wed Apr 5 12:56:40.820 2023 (UTC - 4:00) System Uptime: 0 days 0:00:04.829 Loading Kernel Symbols .............................................................. Loading User Symbols Mini Kernel Dump does not contain unloaded driver list For analysis of this file, run !analyze -v nt!WheapCreateLiveTriageDump+0x7b: fffff8014eee9d77 48895c2438 mov qword ptr [rsp+38h],rbx ss:0018:ffffdf8e98acb5d8=ffff920700000a60 12: kd> !analyze -v


  •                                                                         *
    
  •                    Bugcheck Analysis                                    *
    
  •                                                                         *
    

WHEA_UNCORRECTABLE_ERROR (124) A fatal hardware error has occurred. Parameter 1 identifies the type of error source that reported the error. Parameter 2 holds the address of the nt!_WHEA_ERROR_RECORD structure that describes the error condition. Try !errrec Address of the nt!_WHEA_ERROR_RECORD structure to get more details. Arguments: Arg1: 0000000000000007, BOOT Error Arg2: ffff92076e513d68, Address of the nt!_WHEA_ERROR_RECORD structure. Arg3: 0000000000000000 Arg4: 0000000000000000 Debugging Details:

KEY_VALUES_STRING: 1 Key : Analysis.CPU.mSec Value: 2108 Key : Analysis.DebugAnalysisManager Value: Create Key : Analysis.Elapsed.mSec Value: 3250 Key : Analysis.IO.Other.Mb Value: 10 Key : Analysis.IO.Read.Mb Value: 0 Key : Analysis.IO.Write.Mb Value: 17 Key : Analysis.Init.CPU.mSec Value: 1358 Key : Analysis.Init.Elapsed.mSec Value: 41515 Key : Analysis.Memory.CommitPeak.Mb Value: 77 Key : Bugcheck.Code.DumpHeader Value: 0x124 Key : Bugcheck.Code.Register Value: 0x98acb5e0 FILE_IN_CAB: 040523-15343-01.dmp BUGCHECK_CODE: 124 BUGCHECK_P1: 7 BUGCHECK_P2: ffff92076e513d68 BUGCHECK_P3: 0 BUGCHECK_P4: 0 CUSTOMER_CRASH_COUNT: 1 PROCESS_NAME: System STACK_TEXT: ffffdf8e98acb5a0 fffff8014eb8c089 : ffff92076ad23040 ffff92076e513d40 fffff8014ea13580 0000000000000000 : nt!WheapCreateLiveTriageDump+0x7b ffffdf8e98acbad0 fffff8014e92d168 : ffff92076e513d40 fffff8014e725711 0000000000000000 0000000000000000 : nt!WheapCreateTriageDumpFromPreviousSession+0x2d ffffdf8e98acbb00 fffff8014e92de7b : fffff8014ea13520 fffff8014ea13580 fffff8014ea175e0 ffff920767afa960 : nt!WheapProcessWorkQueueItem+0x48 ffffdf8e98acbb40 fffff8014e6c01ba : ffff920767cc0920 ffff92076ad23040 ffff920767cc0900 ffff920700000000 : nt!WheapWorkQueueWorkerRoutine+0x2b ffffdf8e98acbb70 fffff8014e741ed5 : ffff92076ad23040 ffff920767bf5300 ffff92076ad23040 0000000000000000 : nt!ExpWorkerThread+0x16a ffffdf8e98acbc10 fffff8014e7d051c : fffff8014d0f1180 ffff92076ad23040 fffff8014e741e80 0000000000000000 : nt!PspSystemThreadStartup+0x55 ffffdf8e98acbc60 0000000000000000 : ffffdf8e98acc000 ffffdf8e98ac6000 0000000000000000 0000000000000000 : nt!KiStartSystemThread+0x1c MODULE_NAME: AuthenticAMD IMAGE_NAME: AuthenticAMD.sys STACK_COMMAND: .cxr; .ecxr ; kb FAILURE_BUCKET_ID: 0x124_7_AuthenticAMD_PROCESSOR__UNKNOWN_IMAGE_AuthenticAMD.sys OSPLATFORM_TYPE: x64 OSNAME: Windows 10 FAILURE_ID_HASH: {9a3989b5-afe5-d9f8-5fed-f06a563b7314} Followup: MachineOwner

12: kd> lmvm AuthenticAMD Browse full module list start end module name Mini Kernel Dump does not contain unloaded driver list

Windows for business | Windows Server | User experience | Other
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Limitless Technology 44,766 Reputation points
    2023-04-06T09:19:00.1366667+00:00
    Hello
    
    Thank you for your question and reaching out. I can understand you are  having query\issues related  to BSOD.
    
    Form the logs it looks that issue with authenticamd.sys file
    
    1. Disable any Antivirus program or Windows firewall you may have for temporary purpose.
    
    2. Cleanup below Temp folders location -> Open Start -> Run -> Type below location one-by-one and press enter 
        ->  C:\Windows\Temp
        ->   %USERPROFILE%\AppData\Local\Temp
    
    3. Run Disk Cleanup from Select C:\ Drive from Properties- > General -> Disk Cleanup - >Cleanup system files
    
    4. Run sfc /scannow from elevated prompt.
    
    5.  Run below DISM commands  from elevated prompt.
    
    DISM /Online /Cleanup-Image /CheckHealth
    DISM /Online /Cleanup-Image /ScanHealth
    DISM /Online  /Cleanup-Image /RestoreHealth
    
    6.  Disable fast startup using below command.
    
    Powercfg -h off
    
    7. Update BIOS firmware and AMD graphics drivers from AMD website. 
    
    --If the reply is helpful, please Upvote and Accept as answer--
    
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.