VMware/HyperV guest Windows Server 2016 BSOD in tcpip.sys (0xD1: DRIVER_IRQL_NOT_LESS_OR_EQUAL)

Anonymous
2021-08-24T20:08:07.357+00:00

Hi All,

Recently we have been experiencing a BSOD in Windows Server 2016 tcpip.sys that affects a number of servers in different data centers around the world, but all with the same stack trace. The frequency of crashes range from once a week to 5-6 times per day per server.

Most affected systems are VMware or HyperV guests running Windows Server 2016, all crashes are in tcpip.sys with identical stack traces:

  nt!KeBugCheckEx
  nt!KiBugCheckDispatch+0x69
  nt!KiPageFault+0x428
  tcpip!TcpDequeueTcbSend+0x6e5
  tcpip!TcpTcbFastDatagram+0x2ca
  tcpip!TcpTcbReceive+0x247
  tcpip!TcpMatchReceive+0x1e4
  tcpip!TcpPreValidatedReceive+0x363
  tcpip!IppLoopbackIndicatePackets+0xa7
  tcpip!IppLoopbackTransmit+0xd4
  tcpip!IppLoopbackTransmitWorker+0x2e
  nt!IopProcessWorkItem+0x80
  nt!ExpWorkerThread+0x69f
  nt!PspSystemThreadStartup+0x18a
  nt!KiStartSystemThread+0x16

We managed to isolate and reliably reproduce the issue with clean Windows Server 2016 installation on VMware Workstation running Axxon Next 4.5.2.

Axxon Next is purely userspace software which does not install any kernel drivers (although it makes a heavy use of networking including loopback interface), and should not be able to crash the OS. We have a large installation base of Axxon Next 4.5.2 around the world on a variety of Windows versions, but the issue seems to reproduce mostly on Windows Server 2016.

We have researched a number of similar reports on BSOD in tcpip.sys on the web and tried all suggested solutions but nothing helped so far.

Any support or help investigating the issue is much appreciated, we are ready to provide remote access or share a VM snapshot where the issue is reproduced reliably.

Here is a memory crash dump and initial analysis, see WindowsServer2016_clean_VMware.7z
https://itvgroup-my.sharepoint.com/:f:/g/personal/oleg_malashenko_ru_axxonsoft_com/EklqcqRieStOl5NUxF-cNRsBEeMJc_78rRSO-3mL_j3pLQ?e=LfKJul

Windows for business | Windows Server | User experience | Other
0 comments No comments
{count} votes

Accepted answer
  1. Anonymous
    2021-08-27T14:48:13.643+00:00

    Hi All,

    For the future readers:

    We finalised our investigations and found that using SIO_LOOPBACK_FAST_PATH windows socket option triggers a BSOD in Windows Server 2012R2 and Windows Server 2016. Apparently, Microsoft is well aware of the problem, but are not very interested in fixing it.

    Axxon Next used the option via gRPC https://github.com/grpc/grpc/pull/14905

    In gRPC itself the SIO_LOOPBACK_FAST_PATH was discovered and reverted quick enough, so anyone using gRPC released after March 2019 (>= 1.20.x) should be safe
    gRPC bug report https://github.com/grpc/grpc/issues/18057

    Thanks all and take care.

    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. Limitless Technology 39,931 Reputation points
    2021-08-25T11:21:08.497+00:00

    Hello,

    Thank you for your question..

    As you mentioned this issue seems to be common for some stack of servers.

    I believe they are physical servers from Dell ,IBM or HP.
    Have you been ever tried to get involved them to have suggestions or support and can give any workaround.

    If the reply was helpful, please don’t forget to upvote or accept as answer.
    Thanks,

    PRAKASH T

    0 comments No comments

  2. Limitless Technology 39,931 Reputation points
    2021-08-25T18:51:37.333+00:00

    Hello @Anonymous ,

    This bug check is usually caused by drivers that have used improper addresses.

    Try here: Bug Check 0xD1: DRIVER_IRQL_NOT_LESS_OR_EQUAL

    https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check-0xd1--driver-irql-not-less-or-equal?redirectedfrom=MSDN

    I would recommend to start by upgrading your drivers and firmware.

    By default, the cause of all of the crashes is tcpip.sys which is the TCP/IP Protocol driver (not the true cause), and usually when we have network related crashes like this, it's caused by one of two things:

    1. Network drivers themselves need to be updated.
    2. 3rd party antivirus or firewall software causing NETBIOS conflicts.

    Hope you find it useful.
    Luis P

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.