ISCSI Initiator Random Downs on Backup Server causing total network failure and backup failure

Chayne 1 Reputation point
2022-10-17T13:33:10.003+00:00

Good Afternoon all

I have under my care a Windows Server 2019 Standard, V1809, OS Build 17763.3532
It's main purpose is Backup
It Runs Veeam Backup and Replication.

It is Directly connected to a cluster of ISCSI Switches. a 10GB connection into either one and setup with Nic Teaming for the two 10GB Nic's
We have also tried this with just the singular 10gb with no Teaming and experience the same issues.

Now onto my issue

Randomly we have a major ISCSI failure which results in a total network loss which results in failed backups.
I have attempted to get support from Veeam But as far as they are concerned its an infrastructure issue and not their problem to deal with.

Now onto my issue
It all starts with an error IScsiPrt Error 20 in eventviewer
then we get in mixed order Event ID - 129,9,39,27,63,43,10

In Eventviewer i see those Event ID's in no particular order and often 9 repeats a few times in a row.
what this represents physically is a total network loss

After a few Hrs systems seem to stabilise the errors stop and the network re-establishes
and backup pick up where they left off.

We do eventually have successful backups

This issue occurs at what appears to be random to me (i cannot find a solid item to pinpoint the start of this issue from)
I am at the end of my tether, have attempted every fix i could find, including registry edits and as men=tioned above adding a secondary 10Gb Nic.

But the issue keeps re-occuring
Please help

Windows Server 2019
Windows Server 2019
A Microsoft server operating system that supports enterprise-level management updated to data storage.
3,507 questions
{count} votes

3 answers

Sort by: Most helpful
  1. 2022-11-19T09:19:47.467+00:00

    Same issue here :-(
    I'm investigating this problem for weeks, updating Windows 2019 OS, network drivers, SAN storage firmware, etc., follwing suggestions like disabling VMQ, RSS, RSC - with no success. It always starts suddenly with iScsiPrt error 20 with subsequent different iScsiPrt error and of course disk errors for unreachable UPD files (it's a Remote Desktop Session Host VM on Hyper-V) and Ntfs errors for open files which lost connection...
    The server becomes completely unresponsive und has to be hard reset. Maybe it would recover if waiting for hours - we didn't wair that long.
    Were you able to resolve the problem in the meantime?
    Kind regards

    0 comments No comments

  2. Olaf Engelke 1 Reputation point
    2023-01-19T07:40:31.05+00:00

    Well, I experience the same behavior with newer models (and also the newest version) of Tandberg QuikStation and Windows Server from version 2012 R2 to 2022 using Backup Exec or Acronis Cyber Protect on premise, usually connected via Gigabit network. (The most ancient and slowest version of Tandberg QuikStation does not show this behavior in an iSCSI connection even if used with the same network connection and server as the newer models.) Tandberg support could never find a solution for this. Using a direct network connection between the HP Proliant and the QuikStation made no difference either.

    If I manage to disconnect a troublesome iSCSI connection and reconnect it again the connection is usually ok again for the next days or weeks, although not in all cases.

    I have never experienced this issue in iSCSI connections to QNAP or Synology devices.

    It's an annoying, neverending and expensive struggle.

    My opinion is that there is something up in the iSCSI communication (maybe a bug in the iSCSI connector components of Windows Server) communicating with the iSCSI stack in specific hardware). So a solution needs to be found either by Microsoft or by the hardware vendor.

    Best greetings from Germany
    Olaf

    0 comments No comments

  3. ADConnect 5 Reputation points
    2023-11-14T11:59:18.9233333+00:00

    I also have a similar problem, the preferred destinations show ipv6 fe80 in the details, but when outputting via Powershell I get the correct assignments. It could be that this is a display error. However, I have not checked whether data loss is prevalent

    best from germany

    bedir

    0 comments No comments