Share via

BoD (DPC_WATCHDOG_VIOLATION / VIDEO_TDR_FAILURE) due to graphics driver or faulty hardware?

Anonymous
2020-11-08T15:45:30+00:00

Since quite some time I regularly get BoDs with a DPC_WATCHDOG_VIOLATION (133) and after some driver changes VIDEO_TDR_FAILURE (116).

  • DPC_WATCHDOG_VIOLATION (133): My screen and audio freezes, sometimes comes back after some seconds or ends in a Blue Screen (Green Screen for me).
  • VIDEO_TDR_FAILURE (116): The screen turns black but audio keeps going. This continues until I hit a key on the keyboard or mouse, then it directly hard reboots.

At first it felt like it was coming from the network card / driver of the mainboard since it most often happens when I do something in the Edge browser (the new Chromium based one).

I already turned off hardware acceleration. That did not make any change.

I already put a ton of time into this issue (ignoring the amount of time / days it already cost me with having everything crash and reboot during work (home office).

Based on my analysis of the memory dumps I concluded that it is coming from my EVGA NVIDIA GTX 1080 graphics card or its driver. I now confirmed this by disabling the graphics card completely and only working with the onboard graphics of my CPU (device manager -> GPU -> disable). Since I did that, all issues and even system stuttering I experienced before did go away. I want to clarify that I'm using this graphics card for over 3 years and the issues just started some months ago - I had no issues before. I'm not aware of any changes I made to the system which caused the change.

To fix it I tried the following (in bold everything that might actually be relevant based on my findings with the GPU):

  • update motherboard BIOS
  • install all available drivers from the motherboard support page (IRST, ....)
  • try different versions of the network driver (even unofficial ones)
  • disable hardware acceleration in Microsoft Edge
  • free up storage (free storage on C now ~200 GB)
  • completely reinstalled Windows
  • boot into safe mode, uninstall the NVIDIA drivers with DDU and install
    • the newest available NVIDIA driver ( 457.09)
    • the oldest available NVIDIA driver (440.97)

I saved and analyzed many memory dumps. With the driver changes mentioned above I noticed changes in the error behavior but the failure did not went away.

I now try to determine if my GPU is defect (hardware failure) or if there is some issue with the driver (or the combination of windows and the driver).

I hope that you can help me with that. I no longer have warranty on the graphics card and really would like to avoid having to buy a new GPU at the moment.

See different memory dumps and my system specs below.

Memory dumps (2) with the newest available Nvidia driver (457.09):

DPC_WATCHDOG_VIOLATION (133)

The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL

or above.

Arguments:

Arg1: 0000000000000000, A single DPC or ISR exceeded its time allotment. The offending

component can usually be identified with a stack trace.

Arg2: 0000000000000501, The DPC time count (in ticks).

Arg3: 0000000000000500, The DPC time allotment (in ticks).

Arg4: fffff806520fb320, cast to nt!DPC_WATCHDOG_GLOBAL_TRIAGE_BLOCK, which contains

additional information regarding this single DPC timeout

SYMBOL_NAME:  dxgmms2!VidSchiWorkerThreadTimerCallback+46

MODULE_NAME: dxgmms2

IMAGE_NAME:  dxgmms2.sys

IMAGE_VERSION:  10.0.19041.546

STACK_COMMAND:  .thread ; .cxr ; kb

BUCKET_ID_FUNC_OFFSET:  46

FAILURE_BUCKET_ID:  0x133_DPC_dxgmms2!VidSchiWorkerThreadTimerCallback

OS_VERSION:  10.0.19041.1

BUILDLAB_STR:  vb_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

FAILURE_ID_HASH:  {7c8b50ff-53da-11c4-a8b3-cef8f5e196be}

 # Child-SP          RetAddr               Call Site

00 ffffde801e7d9c88 fffff80651887372     nt!KeBugCheckEx

01 ffffde801e7d9c90 fffff8065172c2cd     nt!KeAccumulateTicks+0x15e2c2

02 ffffde801e7d9cf0 fffff8065172c871     nt!KiUpdateRunTime+0x5d

03 ffffde801e7d9d40 fffff806517266e3     nt!KiUpdateTime+0x4a1

04 ffffde801e7d9e80 fffff8065172eff2     nt!KeClockInterruptNotify+0x2e3

05 ffffde801e7d9f30 fffff8065162ecd5     nt!HalpTimerClockInterrupt+0xe2

06 ffffde801e7d9f60 fffff806517f6cba     nt!KiCallInterruptServiceRoutine+0xa5

07 ffffde801e7d9fb0 fffff806517f7227     nt!KiInterruptSubDispatchNoLockNoEtw+0xfa

08 ffffb60d2ec848f0 fffff806516becdb     nt!KiInterruptDispatchNoLockNoEtw+0x37

09 ffffb60d2ec84a80 fffff8065170bbda     nt!KeYieldProcessorEx+0x1b

0a ffffb60d2ec84a90 fffff80651709c53     nt!KxWaitForLockOwnerShip+0x2a

0b ffffb60d2ec84ac0 fffff8065cd33696     nt!KeAcquireInStackQueuedSpinLockAtDpcLevel+0x73

0c ffffb60d2ec84af0 fffff806516bdef9     dxgmms2!VidSchiWorkerThreadTimerCallback+0x46

0d ffffb60d2ec84b50 fffff806516bd735     nt!KiExpireTimer2+0x429

0e ffffb60d2ec84c60 fffff806516e4cc4     nt!KiTimer2Expiration+0x165

0f ffffb60d2ec84d20 fffff806517fc255     nt!KiRetireDpcList+0x874

10 ffffb60d2ec84fb0 fffff806517fc040     nt!KxRetireDpcList+0x5

11 ffffb60d2eecd910 fffff806517fb70e     nt!KiDispatchInterruptContinue

12 ffffb60d2eecd940 fffff8065165095a     nt!KiDpcInterrupt+0x2ee

13 ffffb60d2eecdad0 fffff8065165086c     nt!KiExitThreadWait+0x4a

14 ffffb60d2eecdb10 fffff806517074a7     nt!KiFastExitThreadWait+0x40

15 ffffb60d2eecdb40 fffff80655850d4f     nt!KeDelayExecutionThread+0x3b7

16 ffffb60d2eecdbd0 fffff806516a29a5     iaStorAVC!EventQueue::main+0xc3

17 ffffb60d2eecdc10 fffff806517fc868     nt!PspSystemThreadStartup+0x55

18 ffffb60d2eecdc60 0000000000000000     nt!KiStartSystemThread+0x28

See memory dump file here: https://1drv.ms/u/s!Ap66x80gdX2Blch_LmzbCHr9Na5LHw?e=40aJne

Nvidia requests a lock at DPC level but does not release it. That is probably what causes the DPC_WATCHDOG_VIOLATION in this case.

DPC_WATCHDOG_VIOLATION (133)

The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL

or above.

Arguments:

Arg1: 0000000000000000, A single DPC or ISR exceeded its time allotment. The offending

component can usually be identified with a stack trace.

Arg2: 0000000000000501, The DPC time count (in ticks).

Arg3: 0000000000000500, The DPC time allotment (in ticks).

Arg4: fffff8021eafb320, cast to nt!DPC_WATCHDOG_GLOBAL_TRIAGE_BLOCK, which contains

additional information regarding this single DPC timeout

SYMBOL_NAME:  dxgkrnl!DpiFdoDpcForIsr+37

MODULE_NAME: dxgkrnl

IMAGE_NAME:  dxgkrnl.sys

STACK_COMMAND:  .thread ; .cxr ; kb

BUCKET_ID_FUNC_OFFSET:  37

FAILURE_BUCKET_ID:  0x133_DPC_dxgkrnl!DpiFdoDpcForIsr

OS_VERSION:  10.0.19041.1

BUILDLAB_STR:  vb_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

FAILURE_ID_HASH:  {ba837505-1263-7a6a-27ed-8858d50757c2}

 # Child-SP          RetAddr               Call Site

00 ffffaf00cdd3fe18 fffff8021e287372     nt!KeBugCheckEx

01 ffffaf00cdd3fe20 fffff8021e126853     nt!KeAccumulateTicks+0x15e2c2

02 ffffaf00cdd3fe80 fffff8021e12633a     nt!KeClockInterruptNotify+0x453

03 ffffaf00cdd3ff30 fffff8021e02ecd5     nt!HalpTimerClockIpiRoutine+0x1a

04 ffffaf00cdd3ff60 fffff8021e1f6cba     nt!KiCallInterruptServiceRoutine+0xa5

05 ffffaf00cdd3ffb0 fffff8021e1f7227     nt!KiInterruptSubDispatchNoLockNoEtw+0xfa

06 ffffd001f5430970 fffff8021e10bbd0     nt!KiInterruptDispatchNoLockNoEtw+0x37

07 ffffd001f5430b00 fffff8021e109c53     nt!KxWaitForLockOwnerShip+0x20

08 ffffd001f5430b30 fffff8022b264667     nt!KeAcquireInStackQueuedSpinLockAtDpcLevel+0x73

09 ffffd001f5430b60 fffff8021e0e535e     dxgkrnl!DpiFdoDpcForIsr+0x37

0a ffffd001f5430bb0 fffff8021e0e4644     nt!KiExecuteAllDpcs+0x30e

0b ffffd001f5430d20 fffff8021e1fc255     nt!KiRetireDpcList+0x1f4

0c ffffd001f5430fb0 fffff8021e1fc040     nt!KxRetireDpcList+0x5

0d ffffd001f5baa830 fffff8021e1fb70e     nt!KiDispatchInterruptContinue

0e ffffd001f5baa860 fffff8021e1f648b     nt!KiDpcInterrupt+0x2ee

0f ffffd001f5baa9f0 fffff8022b263b6c     nt!KeSynchronizeExecution+0x5b

10 ffffd001f5baaa30 fffff8022ef7320e     dxgkrnl!DpSynchronizeExecution+0xac

11 ffffd001f5baaa80 fffff8022ef93011     nvlddmkm+0x81320e

12 ffffd001f5baab20 fffff8021e0a29a5     nvlddmkm+0x833011

13 ffffd001f5baac10 fffff8021e1fc868     nt!PspSystemThreadStartup+0x55

14 ffffd001f5baac60 0000000000000000     nt!KiStartSystemThread+0x28

See memory dump file here: https://1drv.ms/u/s!Ap66x80gdX2Blch9aBeiG9IudyWJ-Q?e=VdjPUM

Same thing again with slightly different outcome. Nvidia requests a lock at DPC level but does not release it. That is probably what causes the DPC_WATCHDOG_VIOLATION again.

Memory dump with the oldest available Nvidia driver (440.97):

VIDEO_TDR_FAILURE (116)

Attempt to reset the display driver and recover from timeout failed.

Arguments:

Arg1: ffffe085fbe08010, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).

Arg2: fffff80475374860, The pointer into responsible device driver module (e.g. owner tag).

Arg3: 0000000000000000, Optional error code (NTSTATUS) of the last failed operation.

Arg4: 000000000000000d, Optional internal context dependent data.

SYMBOL_NAME:  nvlddmkm+b24860

MODULE_NAME: nvlddmkm

IMAGE_NAME:  nvlddmkm.sys

STACK_COMMAND:  .thread ; .cxr ; kb

FAILURE_BUCKET_ID:  0x116_IMAGE_nvlddmkm.sys

OS_VERSION:  10.0.19041.1

BUILDLAB_STR:  vb_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

FAILURE_ID_HASH:  {c89bfe8c-ed39-f658-ef27-f2898997fdbd}

 # Child-SP          RetAddr               Call Site

00 ffffcd8c6e725868 fffff804726014be     nt!KeBugCheckEx

01 ffffcd8c6e725870 fffff80472600b21     dxgkrnl!TdrBugcheckOnTimeout+0xfe

02 ffffcd8c6e7258b0 fffff8047200d683     dxgkrnl!TdrIsRecoveryRequired+0x1b1

03 ffffcd8c6e7258e0 fffff8047206770c     dxgmms2!VidSchiReportHwHang+0x62f

04 ffffcd8c6e7259e0 fffff804720a20d7     dxgmms2!VidSchWaitForCompletionEvent+0x33fec

05 ffffcd8c6e725a60 fffff804720a11ba     dxgmms2!VidSchiWaitForDrainFlipQueue+0x8f

06 ffffcd8c6e725b50 fffff8047205aed0     dxgmms2!VidSchiDrainFlipQueue+0x1a

07 ffffcd8c6e725b80 fffff8047205acfa     dxgmms2!VidSchiRun_PriorityTable+0x1c0

08 ffffcd8c6e725bd0 fffff804660a29a5     dxgmms2!VidSchiWorkerThread+0xca

09 ffffcd8c6e725c10 fffff804661fc868     nt!PspSystemThreadStartup+0x55

0a ffffcd8c6e725c60 0000000000000000     nt!KiStartSystemThread+0x28

See memory dump file here: https://1drv.ms/u/s!Ap66x80gdX2Blch-daA931dp35MDNw?e=StKwhc

In this case I'm not so sure on the failure mode. It seems to wait for something which most probably does not happen.

System Specs:

  • Windows insider
  • Operating System: Windows 10 Pro 64-bit (10.0, Build 19042) (19041.vb_release.191206-1406)
  • Mainboard: Gigabyte GA-Z77M-D3H (rev. 1.1) with BIOS version F15a (type: UEFI)
  • CPU: Intel(R) Core(TM) i7-3770K (no overclocking)
  • RAM: Corsair DRAM 2x8GB (16 GB), DDR3 1600 Mhz with XMP Profile enabled
  • GPU: EVGA NVIDIA GTX 1080 tested with driver version 457.09 and 440.97
  • Storage: 2x Samsung SSDs with 250 GB via SATA in RAID 0

I hope someone can help me on that. Please let me now if any further information is needed.

Windows for home | Windows 10 | Performance and system failures

Locked Question. This question was migrated from the Microsoft Support Community. You can vote on whether it's helpful, but you can't add comments or replies or follow the question.

0 comments No comments

6 answers

Sort by: Most helpful
  1. Anonymous
    2020-11-15T15:01:34+00:00

    Hi Miguel,

    sorry that it took me so long (a lot to do at work).

    I did run FurMark expecting it to instantly crash but it did not. I tested with different setting and let it run for 20 minutes. No crash, no errors, and no issues. Everything stable and temperature caps out at around 84° C.

    I did take a look at the even manager around the times it crashed in the past. There is nothing significant predating the system crashes.

    I did run the Driver Verifier as you instructed and it did already crash on the restart with a DRIVER_VERIFIER_DETECTED_VIOLATION (c4).

    See the memory dump file here: https://1drv.ms/u/s!Ap66x80gdX2BlckQjiZYe1Fz8ZFUWQ?e=TdhMb7

    See the full output here:

    2: kd> !analyze -v

    *******************************************************************************

    *                                                                             *

    *                        Bugcheck Analysis                                    *

    *                                                                             *

    *******************************************************************************

    DRIVER_VERIFIER_DETECTED_VIOLATION (c4)

    A device driver attempting to corrupt the system has been caught.  This is

    because the driver was specified in the registry as being suspect (by the

    administrator) and the kernel has enabled substantial checking of this driver.

    If the driver attempts to corrupt the system, bugchecks 0xC4, 0xC1 and 0xA will

    be among the most commonly seen crashes.

    Arguments:

    Arg1: 0000000000002000, Code Integrity Issue: The caller specified an executable pool type. (Expected: NonPagedPoolNx)

    Arg2: fffff808acfc38c4, The address in the driver's code where the error was detected.

    Arg3: 0000000000000000, Pool Type.

    Arg4: 0000000053694558, Pool Tag (if provided).

    Debugging Details:


    KEY_VALUES_STRING: 1

        Key  : Analysis.CPU.mSec

        Value: 5015

        Key  : Analysis.DebugAnalysisProvider.CPP

        Value: Create: 8007007e on SPEN-PC

        Key  : Analysis.DebugData

        Value: CreateObject

        Key  : Analysis.DebugModel

        Value: CreateObject

        Key  : Analysis.Elapsed.mSec

        Value: 5350

        Key  : Analysis.Memory.CommitPeak.Mb

        Value: 78

        Key  : Analysis.System

        Value: CreateObject

        Key  : WER.OS.Branch

        Value: vb_release

        Key  : WER.OS.Timestamp

        Value: 2019-12-06T14:06:00Z

        Key  : WER.OS.Version

        Value: 10.0.19041.1

    ADDITIONAL_XML: 1

    OS_BUILD_LAYERS: 1

    BUGCHECK_CODE:  c4

    BUGCHECK_P1: 2000

    BUGCHECK_P2: fffff808acfc38c4

    BUGCHECK_P3: 0

    BUGCHECK_P4: 53694558

    BLACKBOXNTFS: 1 (!blackboxntfs)

    PROCESS_NAME:  System

    LOCK_ADDRESS:  fffff8034f64fc80 -- (!locks fffff8034f64fc80)

    Resource @ nt!PiEngineLock (0xfffff8034f64fc80)    Exclusively owned

         Threads: ffffaf8708ec8040-01<*> 

    1 total locks

    PNP_TRIAGE_DATA: 

    Lock address  : 0xfffff8034f64fc80

    Thread Count  : 1

    Thread address: 0xffffaf8708ec8040

    Thread wait   : 0x427

    STACK_TEXT:  

    ffffdd8e7eb2c3e8 fffff8034f3d8e34     : 00000000000000c4 0000000000002000 fffff808acfc38c4 0000000000000000 : nt!KeBugCheckEx

    ffffdd8e7eb2c3f0 fffff8034efa67b5     : fffff8034f61dc2c 0000000000002000 fffff808acfc38c4 0000000000000000 : nt!VerifierBugCheckIfAppropriate+0xe0

    ffffdd8e7eb2c430 fffff8034f3cfdf4     : 0000000053694558 fffff8034f61dc2c fffff808acfc38c4 0000000000000000 : nt!VfReportIssueWithOptions+0x101

    ffffdd8e7eb2c480 fffff8034f3dcff2     : 0000000000000000 ffffaf8708ee3060 0000000000000026 00000000000000c0 : nt!VfCheckPoolType+0x90

    ffffdd8e7eb2c4c0 fffff808acfc38c4     : 0000000000000220 0000000000000000 00000000000000a0 fffff8034f3bc094 : nt!VerifierExAllocatePoolWithTag+0x62

    ffffdd8e7eb2c510 fffff8034ffec2ca     : ffffaf8708619f70 ffffaf8708ee3060 ffffaf870c4aa910 ffffaf870c4aaa60 : SiUSBXp+0x38c4

    ffffdd8e7eb2c670 fffff8034ed75b67     : fffff8034ffec190 0000000000000004 ffffaf8708ee3060 0000000000000000 : VerifierExt!xdv_AddDevice_wrapper+0x13a

    ffffdd8e7eb2c6d0 fffff8034f12703c     : ffffaf870c8e9e10 ffffaf8708bf8690 0000000000000003 ffffaf870873ab00 : nt!PpvUtilCallAddDevice+0x3b

    ffffdd8e7eb2c710 fffff8034f12a82f     : 0000000000000003 0000000000000000 000000006e657050 fffff80300000000 : nt!PnpCallAddDevice+0x94

    ffffdd8e7eb2c7d0 fffff8034f129bb7     : ffffaf870c495aa0 ffffdd8e7eb2ca11 ffffaf870c495aa0 0000000000000000 : nt!PipCallDriverAddDevice+0x827

    ffffdd8e7eb2c990 fffff8034f14c1d4     : ffffaf870c8c5100 fffff8034ece6101 ffffdd8e7eb2cab0 fffff80300000002 : nt!PipProcessDevNodeTree+0x333

    ffffdd8e7eb2ca60 fffff8034ed7b7f6     : 0000000100000003 ffffaf870873abe0 ffffaf870c8c5150 ffffaf870c8c5150 : nt!PiProcessReenumeration+0x88

    ffffdd8e7eb2cab0 fffff8034ed0e4b5     : ffffaf8708ec8040 ffffaf8703890a00 fffff8034f64e4e0 ffffaf8700000000 : nt!PnpDeviceActionWorker+0x206

    ffffdd8e7eb2cb70 fffff8034ecad9a5     : ffffaf8708ec8040 0000000000000080 ffffaf87038c5040 00078405b19bbdff : nt!ExpWorkerThread+0x105

    ffffdd8e7eb2cc10 fffff8034ee07868     : ffff8a8001bda180 ffffaf8708ec8040 fffff8034ecad950 3b3b3b3b3b3b3b3b : nt!PspSystemThreadStartup+0x55

    ffffdd8e7eb2cc60 0000000000000000     : ffffdd8e7eb2d000 ffffdd8e7eb27000 0000000000000000 0000000000000000 : nt!KiStartSystemThread+0x28

    SYMBOL_NAME:  SiUSBXp+38c4

    MODULE_NAME: SiUSBXp

    IMAGE_NAME:  SiUSBXp.sys

    STACK_COMMAND:  .thread ; .cxr ; kb

    BUCKET_ID_FUNC_OFFSET:  38c4

    FAILURE_BUCKET_ID:  0xc4_2000_VRF_SiUSBXp!unknown_function

    OS_VERSION:  10.0.19041.1

    BUILDLAB_STR:  vb_release

    OSPLATFORM_TYPE:  x64

    OSNAME:  Windows 10

    FAILURE_ID_HASH:  {628948ff-6ca9-cb58-bf7a-4538f8f264d3}

    Followup:     MachineOwner


    2: kd> !verifier

    Verify Flags Level 0x03bbedbb

      STANDARD FLAGS:

        [X] (0x00000000) Automatic Checks

        [X] (0x00000001) Special pool

        [X] (0x00000002) Force IRQL checking

        [X] (0x00000008) Pool tracking

        [X] (0x00000010) I/O verification

        [X] (0x00000020) Deadlock detection

        [X] (0x00000080) DMA checking

        [X] (0x00000100) Security checks

        [X] (0x00000800) Miscellaneous checks

        [X] (0x00020000) DDI compliance checking

      ADDITIONAL FLAGS:

        [ ] (0x00000004) Randomized low resources simulation

        [ ] (0x00000200) Force pending I/O requests

        [X] (0x00000400) IRP logging

        [X] (0x00002000) Invariant MDL checking for stack

        [X] (0x00004000) Invariant MDL checking for driver

        [X] (0x00008000) Power framework delay fuzzing

        [X] (0x00010000) Port/miniport interface checking

        [ ] (0x00040000) Systematic low resources simulation

        [X] (0x00080000) DDI compliance checking (additional)

        [X] (0x00200000) NDIS/WIFI verification

        [X] (0x00800000) Kernel synchronization delay fuzzing

        [X] (0x01000000) VM switch verification

        [X] (0x02000000) Code integrity checks

      RESERVED FLAGS (use of these flags is unsupported):

        [X] (0x00100000) Unused or reserved flag

        [X] Indicates flag is enabled

    Summary of All Verifier Statistics

      RaiseIrqls           0x18

      AcquireSpinLocks     0xee09

      Synch Executions     0x0

      Trims                0x15e

      Pool Allocations Attempted             0x23ab

      Pool Allocations Succeeded             0x23ab

      Pool Allocations Succeeded SpecialPool 0x23ab

      Pool Allocations With NO TAG           0x0

      Pool Allocations Failed                0x0

      Current paged pool allocations         0x13 for 000026BA bytes

      Peak paged pool allocations            0x13 for 000026BA bytes

      Current nonpaged pool allocations      0x8f for 0021E332 bytes

      Peak nonpaged pool allocations         0x90 for 0021E71A bytes

      Execute pool type count                0x0

      Execute page protection count          0x0

      Execute page mapping count             0x0

      Execute-Write section count            0x0

      Section alignment failures             0x0

      IAT Executable Section failures:       0x0

    It seems to have some problem with this SiUSBXp thing which seems to be part of the Nvidia driver if I'm not mistaken. A quick google search did not bring anything up on how to update this part (or anything else). I have no clue what it is and what it does.

    See the memory dump file here: https://1drv.ms/u/s!Ap66x80gdX2BlckQjiZYe1Fz8ZFUWQ?e=TdhMb7

    I'm not sure what to make of that. So does that mean the GPU is fine but there is some problem with the Nvidia driver?

    Was this answer helpful?

    2 people found this answer helpful.
    0 comments No comments
  2. Anonymous
    2020-11-08T20:21:59+00:00

    Hi Miguel,

    sorry that I questioned you.

    I will take a look at the even manager tomorrow and will come back to you when I find something.

    I will also check with the Driver Verifier.

    The issue does happen very frequently (multiple times a session) but it is not clearly reproduceable.

    I initially thought it had something to do with the network driver since it often happens when I'm loading webpages.

    It happens most frequently when a YouTube video is running in a tab while I'm performing other stuff. If often crashed while opening a new tab in Edge.

    I guess that stress testing the GPU e.g. with FurMark will also trigger a crash. I can test that. Might be good to have a consistent reproduction.

    I wonder how I could differentiate between hardware failure and driver issues?

    If it is a hardware failure then it is what it is but I wouldn't like to declare my GPU as dead while it is just some software problem with the hardware being fine.

    Was this answer helpful?

    0 comments No comments
  3. Anonymous
    2020-11-08T17:13:55+00:00

    Hey there Spen!

    Trust me, I did read your post and I gave you instructions to a proper driver reinstallation and I gave you the steps that NVIDIA itself gave for this issue. We are not NVIDIA experts, and I think you've taken all the right steps that should be taken. Drivers uninstall, reinstall and update, BIOS update, Windows update, OS reinstallation... If these all don't work, it must be a hardware issue or incompatibility.

    The Event Manager can give information on what exactly happened to cause the issue.

    To see if it's a driver issue, can you run Driver Verifier? Follow these steps:

    1. Make sure you have a Restore Point created before running these steps.
    2. Type 'verifier' in the Windows search menu.
    3. Select "Create custom settings (for code developers)" and click "Next"
    4. Select "Select individual settings from a full list" and click "Next"
    5. Select everything except for "Force Pending I/O Requests" and "Low Resource Simulation" and click "Next".
    6. Select "Select driver names from a list" and click "Next"
    7. Then select all drivers NOT provided by Microsoft and click "Next"
    8. Select "Finish" on the next page.
    9. Reboot your system and use it as you normally do.

    This will show if the issue actually resides on the NVIDIA driver, any other, or if it's a hardware issue.

    Let's see if this gives any other clue. You've done a proper troubleshooting process, so if this doesn't give any other clue we may be in trouble.

    Being a Windows Insider, I don't think it should do any problem if the problem didn't start after an Insider update. Try to rollback NVIDIA's driver to other versions, not just one, and see if there's an appropiate one among all of them.

    Have you noticed this problem to happen after any procedure, or just randomly?

    I'll be here waiting for you.

    Miguel Ángel :)

    Was this answer helpful?

    0 comments No comments
  4. Anonymous
    2020-11-08T16:55:58+00:00

    Hi Miguel,

    I don't feel you did actually read my post. I still want to give some additional information based on your reply:

    • The GPU is close to reference design and is water cooled.
    • The GPU is not and never was overclocked.
    • The GPU is running in its default configuration.
    • The GPU core temperature is 40-60° C and should be fine.
    • The power supply is a very good one which supports min. 3 times the power output then needed. It should be very stable. There is also no power loss behavior (like instantly turning off).
    • As written in my original post, I already installed the NVIDIA drivers clean and correctly.
    • I have no special programs running in the background.
    • I have no additional task management or optimization tools running.
    • I have no system monitoring tools running
    • I have no overclocking tools running
    • No hardware overclocking is active
    • I have no additional virus scanner besides the Microsoft one
    • The installed drivers as described in my original post are official drivers from NVIDIAs site.
    • I already used the custom install option to only install the drivers
    • My Windows is up-to-date and as I mentioned I'm also ahead (insider program) - the version I have is also in my original post.

    I don't know what running FurMark should tell me. It will crash with a BoD and then? It is already crashing even without FurMark so running FurMark will not add any information.

    I will look into the even manager. Not sure what the event manager should tell me in addition to what the momory dumps already tell me?

    The links are just the Nvidia support pages and so on. So are you saying I should contact them instead of asking for help here?

    Was this answer helpful?

    0 comments No comments
  5. Anonymous
    2020-11-08T16:13:40+00:00

    Hi there Spen!

    Updated drivers shouldn't give any problem at all. If they do, there must be a problem. I've checked over NVIDIA's forums and they have given these solutions:

    1. Reboot and install the driver from Intel first, reboot and install the NVIDIA driver.
    2. Shut down any un-necessary programs in the background.
    3. If your GPU is overlocked set it back to the factory clocks.
    4. Shut down any and all system monitoring programs and any overlooking tools you might have.
    5. Shut down your virus scanner.
    6. nstall the driver only from the .exe downloaded from https://www.nvidia.com/Download/index.aspx?lang..., don't use GeForce experience.
    7. Use the custom installation and only install the components you need (don't deselect components you need, the installation program will deinstall them if you do)
    8. Always perform a clean installation. (if you have profiles for games, back them up using "NVIDIA inspector")

    These are taken directly from NVIDIA forums. Please, it's important that you deinstall any program related to NVIDIA to do a clean install.

    Run Windows Update as well, remember it's what may solve our problem as well!

    Check as well the Event Viewer. See if there's any critical error in the time of the BSOD. It may help us as well.

    If these don't work, we'll run FurMark to stress the GPU. It seems it could be as well a hardware issue.

    You may as well check over the NVIDIA forums and even NVIDIA support in these links:

    https://www.nvidia.com/en-us/support/

    https://www.nvidia.com/en-us/about-nvidia/commu...

    Drivers: https://www.nvidia.com/en-us/geforce/drivers/

    Let me know how it goes!

    Miguel Ángel :)


    Note: This is a non-Microsoft website. The page appears to be providing accurate, safe information. Watch out for ads on the site that may advertise products frequently classified as a PUP (Potentially Unwanted Products). Thoroughly research any product advertised on the site before you decide to download and install it.

    Was this answer helpful?

    0 comments No comments